______________________________________________________________________ HELP MESSAGE (you may have received this message because your request to pythia does not comply with the instructions below) ______________________________________________________________________ PYTHIA version 1.5 (pythia@anl.gov) server for identification of human repetitive DNA Aleksandar Milosavljevic Jerzy Jurka Genome Structure Group Linus Pauling Institute Argonne Nat'l Lab, Bldg. 202 440 Page Mill Rd. Argonne, Illinois 60439-4833 Palo Alto, California, 94306 (708) 252 7860 (415) 327 4064 milosav@anl.gov jurka@jmullins.stanford.edu ______________________________________________________________________ PYTHIA SOFTWARE AVAILABILITY: for information on availability of Sun sparcstation executable sequence analysis programs used by Pythia, send a message with the word "software" in Subject-line to pythia@anl.gov ______________________________________________________________________ CURRENTLY AVAILABLE SERVICES: (1) Identification of occurrences of repetitive DNA elements -------------------------------------------------------- Pythia identifies occurrences of the repetitive elements REFERENCES: (1.1) (describes the rapid similarity search method used by Pythia) Milosavljevic,A. "Discovering Sequence Similarity by the Algorithmic Significance Method", Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, AAAI Press, (eds. L.Hunter, J.Shavlik, and D.Searls), 1993. (1.2) (describes the original set of repeats) J.Jurka, J.Walichiewicz, and A.Milosavljevic, "Prototypic Sequences for Human Repetitive DNA", J. Mol. Evol. (1992) 35:286-291. (1.3) (describes recently discovered repeats) Jurka,J., Kaplan,D.J, Duncan,C.H., Walichiewicz, J., Milosavljevic,A., Murali,G. and Solus,J.F. "Identification and Characterization of New Human Medium Reiteration Frequency Repeats", Nucleic Acids Research (1993) 21:5:1273-1279. (1.4) (describes the most recently discovered repeats) Iris,J.M.F., Bougueleret,L., Prieur,S., Caterina,D., Primas,G., Perrot,V., Jurka,J., Rodriguez-Tome,P., Claverie,J.M., Dausset,J., and D.Cohen. "Dense Alu Clustering and a Potential New Member of the NFkB Family Within a 90 Kilobase HLA Class III segment", Nature Genetics (1993) 3:137-145. (2) Identification of subfamily membership of Alu sequences ------------------------------------------------------- Pythia first aligns the incoming sequences one by one against the Alu consensus and then checks them for the presence of bases that are diagnostic for Alu subfamilies J, Sb1, Sb2, Sc, Sp, Sq, and Sx. REFERENCES: (2.1) (describes the method used to reconstruct the evolution of Alus) A.Milosavljevic, J.Jurka, "Discovery by Minimal Length Encoding: A Case Study in Molecular Evolution", Machine Learning Journal, Special Issue on Machine Discovery, (1993) vol.12 no.1,2,3. (2.2) (describes the Alu subfamily structure except the subdivision of the youngest Alu subfamily Sb into subfamilies Sb1 and Sb2) J.Jurka, A.Milosavljevic, "Reconstruction and Analysis of Human Alu Genes", J. Mol. Evol. (1991) 32:105-121. (2.3) (describes subfamilies Sb1 and Sb2) J.Jurka, "A new subfamily of recently retroposed human Alu repeats" Nucleic Acids Research, in press. (3) Identification of simple DNA regions ------------------------------------ Pythia finds segments of DNA that contain tandem repeats and other "simple DNA" regions that contain significant repetitions of words. REFERENCE: (3.1) (describes the method for discovering simple repeats) A.Milosavljevic, J.Jurka, "Discovering Simple DNA Sequences by the Algorithmic Significance Method", CABIOS (1993) vol. 9, no. 4 To request service, put a single word describing the request ("Rpts" for service (1), "Alu" for service (2), and "Smpl" for service (3)) in the "Subject"-line of your message to Pythia at "pythia@anl.gov". Occurrences of Alu sequences may be identified using service (1) and their subfamily membership may subsequently be identified using service (2) -- services cannot be requested simultaneously. Alu subfamily is identified correctly only if the submitted Alu sequence is in the same orientation as the Alu consensus described in references (2.2) and (2.3). The body of your message should contain DNA sequences in Intelligenetics format -- the sequences should terminate with '1' and should be preceeded by a name on a line by itself. The names should be unique and distinct from the names of repetitive elements and should not contain underscores ('_') or blanks (' '). The names should be preceeded by at least one line that starts with a semicolon. The sequences should consist of uppercase characters "A","G","C",and"T". There should be no more than 99 characters per line. There should be no empty lines between non-empty ones. If this format is not obeyed, you may get nonsensical output. Please do not send more than a total of 10,000 bases per request to Pythia. If you have large requests, consider running Pythia programs on your own machine -- to get more information on software availability, send e-mail message with a word "software" in Subject-line to pythia@anl.gov. In response to a request longer than 10,000, you may receive this help message. Since the "Rpts" requests are much more time-demanding than the "Alu" and "Smpl" requests, we ask you not to send more than a total of 10,000 bases of "Rpts" requests to Pythia per day. If overloaded by too many requests Pythia may start ignoring requests silently. Examples and more specific instructions are described below. If you have questions or comments, or if you do not get a response from Pythia within 2 days, or if you get a nonsensical output, please contact pythia-admin@anl.gov . EXAMPLE 1: Identification of occurrences of repetitive elements ---------------------------------------------------- An example of a request, starting from the "Subject"-line follows. Subject: Rpts ; HUMGENE1 CTTCTTTGTGGCATTCTGCTGTCGTATACCATGTGGAACACATTAAGAACGTTATGGCCAGGCGTGTTGG CTCACGCCTGTAATCCTAGCACTTTGGGAGGCCAAGGTGGACAGATCACCTGAGGTTGGGAGTTCGAGAC CAGCCTGGCCAGCATGCCGAAACCCTGTCTCTACTAAAAATACAAAAATTAGCCAGGCATGGTGGCACAC ACTTGTAATCCGAGCTACTCGGGAGGCTGAAGAAGGAGAATCGCTTAAACCCAGGAGGCGGAGGTTGCAG TGAGCTGAGATTGCACCGTTGCAATCCAGCCTGGGCAACAGAGTGAAACTCCATCTCAGAAAAAAAAAAA AAGTTACAATTGGGTGTCACATAGACAGTGAGGAGTAGTGGAAAGAGTGTTAGATTTGGGGTAAGAGAAC TGTGTCTCCTGGGCTTGAGTCCTGATGCCAACTCTCACAAGATGTGTCACCGTACAGCACGACGCTTATA CTTTTTTTTTTTTTTTTTTTGAGACAGGATCTTGCTCTGTCTCGAGCTGGAGTGCAGTGGTATGATCACA GCTCACTGCAGCCTCAAACTCCCAGGCTCAGGCCATCCTCCCACCTCAGCCTCCCAAGTAGCTGGGACTA CAGGCATGTACCACCATGCCCAAATAA1 ; HUMGENE2 CAATAAAATCCCAATGCTTCCGCTGCAGAAGTCCAAGAGGACATGACTGCGGCTCCATCTAGTCAAGCCC AGGGCAGGAATTCCCTTCCAGGAAACCAAGCCAGAGCGCTGTGGTCTCTGGGCTGCCAAGATGTCTCAGA CAATGGTCTAGCCCTTCAGCCCACAGAAATCTCTGGGCAAAATTATCTCCCAGCATTGACAGACGAATGG ATAAACAAAATGTGTTCTATCCCACAGTGGAGTATTATTCAGCTTTAAAAAGGAAGGAAAAATGCTGGGC GCGGTGGCTCACACCTTGATCCCAGCACTTTTGGGAGGCTGAGGAAGGAGGATCATTTGTGCCCAGGAGT TCGAGACTAGCCTGGACAACATAGAGAAACCTTGTCTCTACACACACACACACACACACACACACACACA CTCTCTCTCTCTCTCTCTCTCAGCCAGGCACAGTGGCACATGCCTGAAGTCCCAGCTCTGGGAAGCTGAG GCAGGAGGATCTCTTGAGCCTGGTGGGTCAAGGCTGCAGTGAACCATGTTCATGCCACTGCACTCCAGTC TGGATGACAGAGCGAGACCTAGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAGAAAGAAGAAAGAAGAAA GGAAAAAGAAAAAAA1 EXAMPLE 2: Identification of subfamily membership of Alu sequences ------------------------------------------------------- A separate entry should be created for each Alu sequence. The Alu sequences must be in the same orientation as the Alu consensus described in reference (2.2) or (2.3). If the orientation of Alu sequence is not certain, you may try sending sequences in both orientations. An example of a request, starting from the "Subject"-line follows. Subject: Alu ; MAYBEALU1 GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACCTGAGG TCAGGAGTTCGAGACCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAATTAGCCG GGCGTGGAAACGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG AGGCGGAGGTGGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTC TCAAAAAAAA1 ; MAYBEALU2 GGCCGGAAATTGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACCTGAGG TCAGGAGTTCGAGACCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAAGGGGCCG GGCGTGGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG AGGCGGGGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTC TCAAAAACAAACAAACAAA1 EXAMPLE 3: Identification of simple DNA regions ------------------------------------ The setting of parameters is the same as described in reference (3.1). An example of a request, starting from the "Subject"-line follows. Subject: Smpl ; HUMGENE1 TTCCGATAGTGGCTCAGTTTTCTACTTACATAAAAAGACAGCACATTCTCTTAGCAATATGTGTTTGTAT GTGTGTGTGTGTGTGTGTGTGTGTGTGTATATATATATATATATATATATAATTTAGAGGCCGGAAATTG TCAGGAGTTCGAGACCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAAGGGGCCG GGCGTGGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG AGGCGGGGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTC TCAAAAACAAACAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1 ; HUMGENE2 GGCGTGGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG GATAGATGATTAAATAGATGATACATAGATGATAGATAATGATAAATAGATGATAGATGATAGATGATAG GGCGTGGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG GTGATAGATAGATTGATAGATGATAGAAGATTGATAGATGATAGATAGATAGATGATTAAATAGATGATA CATAGATGATAGATAATGATAAATAGATGATAGATGATAGATGATAGGTGATAGATAGATTGATAGATGA TAGAAGATTGATAGATGATAGATACATAGGTGATA1
________________________________________