Genome Information Research Center, Osaka Univ.

PYTHIA e-mail server

______________________________________________________________________
HELP MESSAGE
(you may have received this message because your request to pythia 
does not comply with the instructions below)
______________________________________________________________________
PYTHIA version 1.5 (pythia@anl.gov)
server for identification of human repetitive DNA

Aleksandar Milosavljevic               Jerzy Jurka
Genome Structure Group                 Linus Pauling Institute
Argonne Nat'l Lab, Bldg. 202           440 Page Mill Rd.
Argonne, Illinois 60439-4833           Palo Alto, California, 94306
(708) 252 7860                         (415) 327 4064
milosav@anl.gov                        jurka@jmullins.stanford.edu
______________________________________________________________________
PYTHIA SOFTWARE AVAILABILITY: for information on availability of Sun 
sparcstation executable sequence analysis programs used by Pythia, 
send a message with the word "software" in Subject-line to pythia@anl.gov
______________________________________________________________________

CURRENTLY AVAILABLE SERVICES:

(1) Identification of occurrences of repetitive DNA elements
    --------------------------------------------------------

    Pythia identifies occurrences of the repetitive elements 
    
    REFERENCES:

    (1.1) (describes the rapid similarity search method used by Pythia)
    Milosavljevic,A. "Discovering Sequence Similarity by 
    the Algorithmic Significance Method", Proceedings of the 
    First International Conference on Intelligent Systems for 
    Molecular Biology, AAAI Press, (eds. L.Hunter, J.Shavlik, 
    and D.Searls), 1993.

    (1.2) (describes the original set of repeats)
    J.Jurka, J.Walichiewicz, and A.Milosavljevic,
    "Prototypic Sequences for Human Repetitive DNA",
    J. Mol. Evol. (1992) 35:286-291.

    (1.3) (describes recently discovered repeats)
    Jurka,J., Kaplan,D.J, Duncan,C.H., Walichiewicz, J., 
    Milosavljevic,A., Murali,G. and Solus,J.F.
    "Identification and Characterization of New Human 
    Medium Reiteration Frequency Repeats",
    Nucleic Acids Research (1993) 21:5:1273-1279.

    (1.4) (describes the most recently discovered repeats)
    Iris,J.M.F., Bougueleret,L., Prieur,S., Caterina,D., Primas,G.,
    Perrot,V., Jurka,J., Rodriguez-Tome,P., Claverie,J.M., Dausset,J.,
    and D.Cohen. "Dense Alu Clustering and a Potential New Member of the
    NFkB Family Within a 90 Kilobase HLA Class III segment",
    Nature Genetics (1993) 3:137-145.

    
(2) Identification of subfamily membership of Alu sequences
    -------------------------------------------------------
 
    Pythia first aligns the incoming sequences one by one against the 
    Alu consensus and then checks them for the presence of bases that
    are diagnostic for Alu subfamilies J, Sb1, Sb2, Sc, Sp, Sq, and Sx. 

    REFERENCES:

    (2.1) (describes the method used to reconstruct the evolution of Alus)
    A.Milosavljevic, J.Jurka, "Discovery by Minimal Length 
    Encoding: A Case Study in Molecular Evolution", Machine Learning
    Journal, Special Issue on Machine Discovery, (1993) vol.12 no.1,2,3.

    (2.2) (describes the Alu subfamily structure except the subdivision
    of the youngest Alu subfamily Sb into subfamilies Sb1 and Sb2)
    J.Jurka, A.Milosavljevic, "Reconstruction and Analysis of
    Human Alu Genes", J. Mol. Evol. (1991) 32:105-121.

    (2.3) (describes subfamilies Sb1 and Sb2)
    J.Jurka, "A new subfamily of recently retroposed human Alu repeats"
    Nucleic Acids Research, in press.


(3) Identification of simple DNA regions
    ------------------------------------

    Pythia finds segments of DNA that contain tandem repeats and
    other "simple DNA" regions that contain significant repetitions 
    of words.

    REFERENCE:

    (3.1) (describes the method for discovering simple repeats)
    A.Milosavljevic, J.Jurka, "Discovering Simple DNA Sequences
    by the Algorithmic Significance Method", CABIOS (1993) vol. 9, no. 4


To request service, put a single word describing the request ("Rpts"
for service (1), "Alu" for service (2), and "Smpl" for service (3)) 
in the "Subject"-line of your message to Pythia at "pythia@anl.gov".

Occurrences of Alu sequences may be identified using service (1) and
their subfamily membership may subsequently be identified using
service (2) -- services cannot be requested simultaneously.

Alu subfamily is identified correctly only if the submitted Alu
sequence is in the same orientation as the Alu consensus described
in references (2.2) and (2.3).

The body of your message should contain DNA sequences in
Intelligenetics format -- the sequences should terminate with '1' and
should be preceeded by a name on a line by itself. The names should be
unique and distinct from the names of repetitive elements and should
not contain underscores ('_') or blanks (' ').  The names should be
preceeded by at least one line that starts with a semicolon.  The
sequences should consist of uppercase characters "A","G","C",and"T".
There should be no more than 99 characters per line. There should be
no empty lines between non-empty ones. If this format is not obeyed,
you may get nonsensical output.

Please do not send more than a total of 10,000 bases per request to
Pythia. If you have large requests, consider running Pythia programs
on your own machine -- to get more information on software
availability, send e-mail message with a word "software" in
Subject-line to pythia@anl.gov. 

In response to a request longer than 10,000, you may receive this help
message.  Since the "Rpts" requests are much more time-demanding than
the "Alu" and "Smpl" requests, we ask you not to send more than a
total of 10,000 bases of "Rpts" requests to Pythia per day. If
overloaded by too many requests Pythia may start ignoring requests
silently.

Examples and more specific instructions are described below. If you
have questions or comments, or if you do not get a response from
Pythia within 2 days, or if you get a nonsensical output, please
contact pythia-admin@anl.gov .


EXAMPLE 1: Identification of occurrences of repetitive elements
           ----------------------------------------------------

An example of a request, starting from the "Subject"-line follows.

Subject: Rpts
;
HUMGENE1
CTTCTTTGTGGCATTCTGCTGTCGTATACCATGTGGAACACATTAAGAACGTTATGGCCAGGCGTGTTGG
CTCACGCCTGTAATCCTAGCACTTTGGGAGGCCAAGGTGGACAGATCACCTGAGGTTGGGAGTTCGAGAC
CAGCCTGGCCAGCATGCCGAAACCCTGTCTCTACTAAAAATACAAAAATTAGCCAGGCATGGTGGCACAC
ACTTGTAATCCGAGCTACTCGGGAGGCTGAAGAAGGAGAATCGCTTAAACCCAGGAGGCGGAGGTTGCAG
TGAGCTGAGATTGCACCGTTGCAATCCAGCCTGGGCAACAGAGTGAAACTCCATCTCAGAAAAAAAAAAA
AAGTTACAATTGGGTGTCACATAGACAGTGAGGAGTAGTGGAAAGAGTGTTAGATTTGGGGTAAGAGAAC
TGTGTCTCCTGGGCTTGAGTCCTGATGCCAACTCTCACAAGATGTGTCACCGTACAGCACGACGCTTATA
CTTTTTTTTTTTTTTTTTTTGAGACAGGATCTTGCTCTGTCTCGAGCTGGAGTGCAGTGGTATGATCACA
GCTCACTGCAGCCTCAAACTCCCAGGCTCAGGCCATCCTCCCACCTCAGCCTCCCAAGTAGCTGGGACTA
CAGGCATGTACCACCATGCCCAAATAA1
;
HUMGENE2
CAATAAAATCCCAATGCTTCCGCTGCAGAAGTCCAAGAGGACATGACTGCGGCTCCATCTAGTCAAGCCC
AGGGCAGGAATTCCCTTCCAGGAAACCAAGCCAGAGCGCTGTGGTCTCTGGGCTGCCAAGATGTCTCAGA
CAATGGTCTAGCCCTTCAGCCCACAGAAATCTCTGGGCAAAATTATCTCCCAGCATTGACAGACGAATGG
ATAAACAAAATGTGTTCTATCCCACAGTGGAGTATTATTCAGCTTTAAAAAGGAAGGAAAAATGCTGGGC
GCGGTGGCTCACACCTTGATCCCAGCACTTTTGGGAGGCTGAGGAAGGAGGATCATTTGTGCCCAGGAGT
TCGAGACTAGCCTGGACAACATAGAGAAACCTTGTCTCTACACACACACACACACACACACACACACACA
CTCTCTCTCTCTCTCTCTCTCAGCCAGGCACAGTGGCACATGCCTGAAGTCCCAGCTCTGGGAAGCTGAG
GCAGGAGGATCTCTTGAGCCTGGTGGGTCAAGGCTGCAGTGAACCATGTTCATGCCACTGCACTCCAGTC
TGGATGACAGAGCGAGACCTAGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAGAAAGAAGAAAGAAGAAA
GGAAAAAGAAAAAAA1


EXAMPLE 2: Identification of subfamily membership of Alu sequences
           ------------------------------------------------------- 

A separate entry should be created for each Alu sequence.  The Alu
sequences must be in the same orientation as the Alu consensus
described in reference (2.2) or (2.3).  If the orientation of Alu
sequence is not certain, you may try sending sequences in both
orientations.  An example of a request, starting from the
"Subject"-line follows.

Subject: Alu
;
MAYBEALU1
GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACCTGAGG
TCAGGAGTTCGAGACCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAATTAGCCG
GGCGTGGAAACGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG
AGGCGGAGGTGGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTC
TCAAAAAAAA1
;
MAYBEALU2
GGCCGGAAATTGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACCTGAGG
TCAGGAGTTCGAGACCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAAGGGGCCG
GGCGTGGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG
AGGCGGGGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTC
TCAAAAACAAACAAACAAA1


EXAMPLE 3: Identification of simple DNA regions
           ------------------------------------

The setting of parameters is the same as described in reference (3.1).
An example of a request, starting from the "Subject"-line follows. 

Subject: Smpl
;
HUMGENE1
TTCCGATAGTGGCTCAGTTTTCTACTTACATAAAAAGACAGCACATTCTCTTAGCAATATGTGTTTGTAT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTATATATATATATATATATATATAATTTAGAGGCCGGAAATTG
TCAGGAGTTCGAGACCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAAGGGGCCG
GGCGTGGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG
AGGCGGGGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTC
TCAAAAACAAACAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1
;
HUMGENE2
GGCGTGGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG
GATAGATGATTAAATAGATGATACATAGATGATAGATAATGATAAATAGATGATAGATGATAGATGATAG
GGCGTGGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG
GTGATAGATAGATTGATAGATGATAGAAGATTGATAGATGATAGATAGATAGATGATTAAATAGATGATA
CATAGATGATAGATAATGATAAATAGATGATAGATGATAGATGATAGGTGATAGATAGATTGATAGATGA
TAGAAGATTGATAGATGATAGATACATAGGTGATA1
________________________________________