Genome Information Research Center, Osaka Univ.

QuickSearch e-mail server


HELP [QUICKSEARCH]


Introduction
------------

Mail-QUICKSEARCH is based on the QUICKSEARCH and QUICKSHOW programs developed
by John Devereux as implemented in the GCG package [1]. These programs have
been considerably improved by Peter Rice at EMBL.

It allows you to perform very rapid comparisons of your nucleic acid sequences
against the EMBL and GenBank databases including the most recent entries.

It answers the question: does this sequence already occur in the database
(with a small number of mismatches) ?


Note
----

This is a not only a new service, but the underlying programs and algorithms
are also still experimental. Feel free to send us any comments, proposals,
ideas etc.!


How to Use Mail-QUICKSEARCH
---------------------------

Using Mail-QUICKSEARCH is simple. Send a properly formatted normal mail message
to
                              QUICK@EMBL-Heidelberg.DE

and wait for the results to drop into your mailbox.

Please, don't send interactive messages, the software can't handle them!


The Input Format
----------------

Since Mail-QUICKSEARCH is an automatic process without any human intervention
it only understands a limited set of commands. Thus you have to adhere to a
well-defined syntax, which is pretty easy to learn and understand and should
not cause any problems. Some general rules are:

- Your mail message must contain only one command per line.
- There is only one mandatory command, SEQ. All the other commands
  are optional, and default values will be used whenever they are not specified.
- You can use both uppercase and lowercase characters, or mix them.
- The order of the commands is not important, but make sure that SEQ is the last
  one, since everything following this line will be treated as a sequence (see
  below).
- Blank lines or space characters are accepted.

Here is a list of valid commands that are accepted by Mail-QUICKSEARCH. Please
remember that the only required command is SEQ and the default values are
almost always adequate.

HELP                      you know what it's for, don't you ?

LIB libraryname           "libraryname" can be one of the following (the
                          default is ALL):

                          ALL     all EMBL and GenBank entries
                          GENEW   new EMBL and GenBank entries since latest
                                  release only
   
WINDOW n                  The values  of WINDOW and STRINGENCY determine the
STRINGENCY n              sensitivity of a search. The default values are
                          calculated from the length of your query sequence
                          (WINDOW is length/20 minus 1 with a maximum of 15;
                          STRINGENCY is WINDOW/2).
                          Decreasing the window size and decreasing STRINGENCY
                          (e.g. WINDOW 20, STRINGENCY 5) will increase the
                          sensitivity of the comparison, i.e. you may find
                          more distantly related sequences.
                          Increasing the window size and increasing STRINGENCY
                          (e.g. WINDOW 50, STRINGENCY 47) will decrease the
                          sensitivity. Only very few mismatches will be
                          tolerated. For exact matches use PERFECT instead.

PERFECT                   This option reports only exact matches. It is
                          equivalent to specifying a MATCH value of 100.

MATCH n                   Only database entries that show overlaps of more than
                          n% identity to your query sequence will be presented. 
                          The default value is 90.

BEST                      Determines the algorithm used for the alignment of
                          the query sequence to the database sequences. If 
                          this option is used, Mail-QUICKSEARCH will use a
                          local homology algorithm [2] to align sequences.
                          Default is a Needleman-Wunsch alignment [3] that
                          finds the best overall alignment.

ONE                       Only the strand given in your mail message is compared
                          against the database. If you don't specify ONE then
                          the complementary strand is searched as well.

TITLE string              QUICKSEARCH will use "string" as the Subject line of
                          the mail message that contains the output of your job.
                          Don't use quotes or double quotes !!!

SEQ                       (MANDATORY)
                          *Everything* following this line up to
                             a) the end of the mail message, or
                             b) a line starting with the word END
                          will be treated as part of the sequence.
                          Don't put sequence information on the same line on
                          which you put the SEQ command or the END command.

                          No special format of the query sequence is required.
                          You may include numbering, but make sure to remove all
                          sorts of comments and unrelated information like mail
                          signatures if you don't use END!

Examples of input files:

TITLE This is a test using part of a human globin gene
SEQ
     201  ACAACTTTGA CTTTGAGAAA AGAGAGGTGG AAATGAGGAA AATGACTTTT

     251  CTGTATTAGA TTCCAGTAGA AAGAACTTTC ATCTTTCCCT CGTTTTTTTT

     301  GTTTTAAAAC ATCTATCTGG AGGCAGGACA AGTATGGTCG TTAAAAAGAT

     351  GCAGGCAGAA GGCATATATT GGCTCAGTCA AAGTGGGGAA CTTTGGTGGC

     401  CAAACATACA TTGCTAAGGC TATTCCTATA TCAGCTGGAC ACATATAAAA

     451  TGCTGCTAAT GCTTCATTAC AAACTTATAT CCTTTAATTC CAGATGGGGG

     501  CAAAGTATGT CCAGGGGTGA GGAACAATTG AAACATTTGG GCTGGAGTAG

     551  ATTTTGAAAG TCAGCTCTGT GTGTGTGTGT GTGTGTGCGC GCACGTGTGT
END

or:

WINDOW 30
STRINGENCY 10
ONE
MATCH 95
BEST
SEQ
agcgcgcgtcgtgcgtgcgtgcagatgacaaagtgacgtg
gacgatggcatgacgatacgatgcagatgacgatg


Restrictions
------------

The query sequence may not be longer than 100000 bases. If a match longer than
32000 bases is found, only the first part may be reported. 

The maximum WINDOW size is 50.


The Mail-QUICKSEARCH output
---------------------------

After sending your query to EMBL you will receive two mail messages from the 
Mail-QUICKSEARCH program.

The first one is sent to you immediately after your message is processed. If 
Mail-QUICKSEARCH had any problem with your query, it will tell you so.
Otherwise you  will be notified that your job has been successfully submitted
to the QUICKSEARCH  batch queue, and that the results will be mailed to you
after completion.

Here is an example:
(the > character at the beginning of a line indicates that this line is taken 
from a sample file. It is not contained in the messages that you receive):

> From: QUICK@EMBL-Heidelberg.DE
> Subject: Thanks for your call;  here's the log ...
> To: JOEBIOL@EMBL-Heidelberg.DE
> Message-id: 
> X-Organization: European Molecular Biology Laboratory, Heidelberg.
> X-Envelope-to: JOEBIOL
> X-VMS-To: in%"JOEBIOL@EMBL-Heidelberg.DE"
> 
> 
> TITLE This is a test using part of a human globin gene
> SEQ
>      201  ACAACTTTGA CTTTGAGAAA AGAGAGGTGG AAATGAGGAA AATGACTTTT
> 
>      251  CTGTATTAGA TTCCAGTAGA AAGAACTTTC ATCTTTCCCT CGTTTTTTTT
> 
>      301  GTTTTAAAAC ATCTATCTGG AGGCAGGACA AGTATGGTCG TTAAAAAGAT
> 
>      351  GCAGGCAGAA GGCATATATT GGCTCAGTCA AAGTGGGGAA CTTTGGTGGC
> 
>      401  CAAACATACA TTGCTAAGGC TATTCCTATA TCAGCTGGAC ACATATAAAA
> 
>      451  TGCTGCTAAT GCTTCATTAC AAACTTATAT CCTTTAATTC CAGATGGGGG
> 
>      501  CAAAGTATGT CCAGGGGTGA GGAACAATTG AAACATTTGG GCTGGAGTAG
> 
>      551  ATTTTGAAAG TCAGCTCTGT GTGTGTGTGT GTGTGTGCGC GCACGTGTGT
> END
> 
> * A QUICK batch job has been submitted to the QUICK batch queue.
> * The following parameters are used:
> * Title: This is a test using part of a human globin gene
> * Library to be searched:  ALL
> * Window:                  15
> * Stringency:              7
> * Match:                   90%
> * Both strands searched
> * All overlaps better than 90% will be reported
> * A global alignment method will be used
> * The result file will be mailed to you after completion.
 
The second file that you will receive contains the results of your query.


The Search Results
------------------

The first lines following the mail header contain information about the
sequence name, the date and the parameters used:

>  QUICKMATCH of: JoeBiol_28007243.Quick  April 25, 1990  10:57
> 
>  ** MatchStringency: 0.90 **
> 
> ! QUICKSEARCH of: Sys$Scratch:JoeBiol_28007243.Seq;  April 25, 1990  10:51
> 
>  Comparison Table: Gencoredisk:[Gcgcore.Rundata]Nwsgapdna.Cmp
> 
>  Gap Weight: 5.00  Gap Length Weight: 0.10    ..


Now, all hits better than your MATCH value will be shown as alignments between
the database sequence and your query sequence. The order of these hits does
*not* reflect the quality of the alignments!

If there were no hits (nothing similar in the database, or your WINDOW
value was too low or STRINGENCY too high to find anything), you will get
the following message:

 *** No possible matches were found by QUICKSEARCH ***

If there were some hits, but none were good enough for your MATCH value
(there are often several "random" hits that mean nothing), you will see:

 *** No matches accepted at stringency 0.99 or better ***

where "0.99" comes from a MATCH 99 (percent) command for example.

In the successful alignments, you will find the following values:

"Gaps:" gives you the number of gaps introduced to produce the alignment.
"Quality:" is the score obtained for this alignment (see Algorithm).
"Ratio:" is the quality divided by the number of residues in the overlap
region between the two sequences (usually the length of the shorter sequence)

>  JoeBiol_28007243.Seq;2 Check: 5,507 length: 400 from:      1  to: 400
>  JoeBiol_28007243.Seq;  Length: 400  April 25, 1990  10:50  Check:5,507
> 
>  Empri:Ggagglog     Check: 7,760  length:   1,797  from:     1  to: 1,797
>     Gorilla fetal A-gamma-globin gene. 1/86
> ID   GGAGGLOG   standard; DNA; 1797 BP.
> AC   X03112;
> DT   20-JAN-1986 (annotation)
> DE   Gorilla fetal A-gamma-globin gene
> KW   A-gamma-globin; direct repeat; gamma-globin; tandem repeat.
> OS   Gorilla gorilla (gorilla)
> OC   Eukaryota; Metazoa; Chordata; Vertebrata; Tetrapoda; Mammalia;
> OC   Eutheria; Primates.
> RN   [1] (bases 1-1797)
> RA   Scott A.F., Heath P., Trusko S., Boyer S.H., Prass W., Goodman M., . . . 
>          Diagonal: 754   Range: -399/+400
>              Gaps: 0  Quality: 379.0  Ratio: 0.947
>                   .         .         .         .         .
>        1 ACAACTTTGACTTTGAGAAAAGAGAGGTGGAAATGAGGAAAATGACTTTT 50
>          ||||||||||||||||||| ||||| |||||||||||| |||||||||||
>      755 ACAACTTTGACTTTGAGAATAGAGAAGTGGAAATGAGGCAAATGACTTTT 804
>                   .         .         .         .         .
>       51 CTGTATTAGATTCCAGTAGAAAGAACTTTCATCTTTCCCTCGTTTTTTTT 100
>          || |||||||||||||||||||||||||||||||||||||| ||||| ||
>      805 CTTTATTAGATTCCAGTAGAAAGAACTTTCATCTTTCCCTCATTTTTGTT 854
>                   .         .         .         .         .
>      101 GTTTTAAAACATCTATCTGGAGGCAGGACAAGTATGGTCGTTAAAAAGAT 150
>          ||||||||||||||||||||||||||||||||||||||| ||||| ||||
>      855 GTTTTAAAACATCTATCTGGAGGCAGGACAAGTATGGTCATTAAACAGAT 904
>                   .         .         .         .         .
>      151 GCAGGCAGAAGGCATATATTGGCTCAGTCAAAGTGGGGAACTTTGGTGGC 200
>          ||||||||||||||||||||||||||||||||||||||||||||||||||
>      905 GCAGGCAGAAGGCATATATTGGCTCAGTCAAAGTGGGGAACTTTGGTGGC 954
>                   .         .         .         .         .
>      201 CAAACATACATTGCTAAGGCTATTCCTATATCAGCTGGACACATATAAAA 250
>          |||||||| ||||||||||||||||||||||||||| |||||||||||||
>      955 CAAACATATATTGCTAAGGCTATTCCTATATCAGCTAGACACATATAAAA 1004
>                   .         .         .         .         .
>      251 TGCTGCTAATGCTTCATTACAAACTTATATCCTTTAATTCCAGATGGGGG 300
>          |||||| |||||||||||||||||||||||||||||||||||||||||||
>     1005 TGCTGCCAATGCTTCATTACAAACTTATATCCTTTAATTCCAGATGGGGG 1054
>                   .         .         .         .         .
>      301 CAAAGTATGTCCAGGGGTGAGGAACAATTGAAACATTTGGGCTGGAGTAG 350
>          ||||| ||||||||||||||||||||||||||||||||||||||||||||
>     1055 CAAAGCATGTCCAGGGGTGAGGAACAATTGAAACATTTGGGCTGGAGTAG 1104
>                   .         .         .         .         .
>      351 ATTTTGAAAGTCAGCTCTGTGTGTGTGTGTGTGTGTGCGCGCACGTGTGT 400
>          |||||||||||||||| |||||||||||||||||||| | |     | ||
>     1105 ATTTTGAAAGTCAGCTGTGTGTGTGTGTGTGTGTGTGTGTGTGTCAGCGT 1154
>                                   .
>                                   .
>                                   .
> 
>  JoeBiol_28007243.Seq;2 Check: 5,507  length:     400  from:      1  to: 400
>     JoeBiol_28007243.Seq;  Length: 400  April 25, 1990  10:50  Check: 5,507
> 
>  Empri:Hsags01      Check: 1,418  length:     878  from:     1  to: 878
>     Human A-gamma-S globin gene IVS-2 sequence. 8/84
> ID   HSAGS01    standard; DNA; 878 BP.
> AC   X00672;
> DT   15-AUG-1984 (first entry)
> DE   Human A-gamma-S globin gene IVS-2 sequence
> KW   globin.
> OS   Homo sapiens (human)
> OC   Eukaryota; Metazoa; Chordata; Vertebrata; Tetrapoda; Mammalia;
> OC   Eutheria; Primates.
> RN   [1] (bases 1-878)
> RA   Stoeckert C.J., Collins F.S., Weissman S.M.; . . . 
>          Diagonal: 200   Range: -399/+400
>              Gaps: 0  Quality: 400.0  Ratio: 1.000
>                   .         .         .         .         .
>        1 ACAACTTTGACTTTGAGAAAAGAGAGGTGGAAATGAGGAAAATGACTTTT 50
>          ||||||||||||||||||||||||||||||||||||||||||||||||||
>      201 ACAACTTTGACTTTGAGAAAAGAGAGGTGGAAATGAGGAAAATGACTTTT 250
>                   .         .         .         .         .
>       51 CTGTATTAGATTCCAGTAGAAAGAACTTTCATCTTTCCCTCGTTTTTTTT 100
>          ||||||||||||||||||||||||||||||||||||||||||||||||||
>      251 CTGTATTAGATTCCAGTAGAAAGAACTTTCATCTTTCCCTCGTTTTTTTT 300
>                   .         .         .         .         .
>      101 GTTTTAAAACATCTATCTGGAGGCAGGACAAGTATGGTCGTTAAAAAGAT 150
>          ||||||||||||||||||||||||||||||||||||||||||||||||||
>      301 GTTTTAAAACATCTATCTGGAGGCAGGACAAGTATGGTCGTTAAAAAGAT 350
>                   .         .         .         .         .
>      151 GCAGGCAGAAGGCATATATTGGCTCAGTCAAAGTGGGGAACTTTGGTGGC 200
>          ||||||||||||||||||||||||||||||||||||||||||||||||||
>      351 GCAGGCAGAAGGCATATATTGGCTCAGTCAAAGTGGGGAACTTTGGTGGC 400
>                   .         .         .         .         .
>      201 CAAACATACATTGCTAAGGCTATTCCTATATCAGCTGGACACATATAAAA 250
>          ||||||||||||||||||||||||||||||||||||||||||||||||||
>      401 CAAACATACATTGCTAAGGCTATTCCTATATCAGCTGGACACATATAAAA 450
>                   .         .         .         .         .
>      251 TGCTGCTAATGCTTCATTACAAACTTATATCCTTTAATTCCAGATGGGGG 300
>          ||||||||||||||||||||||||||||||||||||||||||||||||||
>      451 TGCTGCTAATGCTTCATTACAAACTTATATCCTTTAATTCCAGATGGGGG 500
>                   .         .         .         .         .
>      301 CAAAGTATGTCCAGGGGTGAGGAACAATTGAAACATTTGGGCTGGAGTAG 350
>          ||||||||||||||||||||||||||||||||||||||||||||||||||
>      501 CAAAGTATGTCCAGGGGTGAGGAACAATTGAAACATTTGGGCTGGAGTAG 550
>                   .         .         .         .         .
>      351 ATTTTGAAAGTCAGCTCTGTGTGTGTGTGTGTGTGTGCGCGCACGTGTGT 400
>          ||||||||||||||||||||||||||||||||||||||||||||||||||
>      551 ATTTTGAAAGTCAGCTCTGTGTGTGTGTGTGTGTGTGCGCGCACGTGTGT 600
>                                   .
>                                   .
>                                   .
> 
>  JoeBiol_28007243.Seq;2 Check: 5,507  length:     400  from:      1  to: 400
>     JoeBiol_28007243.Seq;  Length: 400  April 25, 1990  10:50  Check: 5,507
> 
>  Empri:Hsggl2       Check: 7,732  length:   1,628  from:     1  to: 1,628
>     Human a gamma-globin gene. 3/83
> ID   HSGGL2     standard; DNA; 1628 BP.
> AC   V00513;
> DT   31-MAR-1983 (feature table expanded)
> DT   17-FEB-1981 (first entry)
> DE   Human a gamma-globin gene.
> KW   gamma-globin; germ line; globin.
> OS   Homo sapiens (human)
> OC   Eukaryota; Metazoa; Chordata; Vertebrata; Tetrapoda; Mammalia;
> OC   Eutheria; Primates.
> RN   [1] (bases 1-1628) . . . 
>          Diagonal: 750   Range: -399/+400
>              Gaps: 1  Quality: 374.6  Ratio: 0.937
>                   .         .         .         .         .
>        1 ACAACTTTGACTTTGAGAAAAGAGAGGTGGAAATGAGGAAAATGACTTTT 50
>              ||||||||||||||||||||||||||||||||||||||||||||||
>      751 ....CTTTGACTTTGAGAAAAGAGAGGTGGAAATGAGGAAAATGACTTTT 796
>                   .         .         .         .         .
>       51 CTGTATTAGATTCCAGTAGAAAGAACTTTCATCTTTCCCTCGTTTTT... 97
>          || ||||||||| | |||||||||||||||| |||||||   |||||   
>      797 CTTTATTAGATTTCGGTAGAAAGAACTTTCACCTTTCCCCTATTTTTGTT 846
>                   .         .         .         .         .
>       98 .TTTGTTTTAAAACATCTATCTGGAGGCAGGACAAGTATGGTCGTTAAAA 146
>           || ||||||||||||||||||||||||||||||||||||||||||||||
>      847 ATTCGTTTTAAAACATCTATCTGGAGGCAGGACAAGTATGGTCGTTAAAA 896
>                   .         .         .         .         .
>      147 AGATGCAGGCAGAAGGCATATATTGGCTCAGTCAAAGTGGGGAACTTTGG 196
>          ||||||||||||||||||||||||||||||||||||||||||||||||||
>      897 AGATGCAGGCAGAAGGCATATATTGGCTCAGTCAAAGTGGGGAACTTTGG 946
>                   .         .         .         .         .
>      197 TGGCCAAACATACATTGCTAAGGCTATTCCTATATCAGCTGGACACATAT 246
>          ||||||||||||||||||||||||||||||||||||||||||||||||||
>      947 TGGCCAAACATACATTGCTAAGGCTATTCCTATATCAGCTGGACACATAT 996
>                   .         .         .         .         .
>      247 AAAATGCTGCTAATGCTTCATTACAAACTTATATCCTTTAATTCCAGATG 296
>          ||||||||||||||||||||||||||||||||||||||||||||||||||
>      997 AAAATGCTGCTAATGCTTCATTACAAACTTATATCCTTTAATTCCAGATG 1046
>                   .         .         .         .         .
>      297 GGGGCAAAGTATGTCCAGGGGTGAGGAACAATTGAAACATTTGGGCTGGA 346
>          ||||||||||||||||||||||||||||||||||||||||||||||||||
>     1047 GGGGCAAAGTATGTCCAGGGGTGAGGAACAATTGAAACATTTGGGCTGGA 1096
>                   .         .         .         .         .
>      347 GTAGATTTTGAAAGTCAGCTCTGTGTGTGTGTGTGTGTGTGCGCGCACGT 396
>          ||||||||||||||||||||||||||||||||||||||||| | |     
>     1097 GTAGATTTTGAAAGTCAGCTCTGTGTGTGTGTGTGTGTGTGTGTGTGTCA 1146
>                   .         .         .         .         .
>      397 GTGT.............................................. 400
>          | ||                                              
>     1147 GCGTGTGTTTCTTTTAACGTCTTCAGCCTACAACATACAGGGTTCATGGT 1196
>                                   .
>                                   .
>                                   .


The first ten lines of a database entry are shown as well to allow you the 
identification of this entry by accession number and ID. Only the part of the 
database sequence that overlaps your query sequence is displayed.
Identities are indicated by a "|".


Retrieving Database Entries
---------------------------

You can easily get a copy of matching sequences from the EMBL or GenBank
databases by using the EMBL File Server. You should always use the accession
number as given in the AC line of EMBL entries or the Accession line of GenBank
entries.

Send a mail message to
                             NETSERV@EMBL-Heidelberg.DE

containing one command per line. The general syntax is:

                              GET NUC:accnumber
eg. GET NUC:J00179

If you are new to the EMBL File Server, send a mail message to NETSERV@EMBL 
containing the line HELP to get some introductory information.
The File Server offers the latest sequence data, several other databases and 
free software for molecular biology.


The Algorithm
-------------

The QUICKSEARCH algorithm was developed by John Devereux in collaboration with
Chemical Abstract Service (CAS) [4].

An exact description of the underlying algorithm is beyond the scope of this
help file. In general, each adjacent "word" in the database of length 20 is
assigned an offset from the beginning of the database. A "hash code" is
calculated for each word and the information is stored in a "hash table" which
can be searched very quickly. The query sequence is compared to the database
by looking up every 20-base region of the query sequence in the "hash table"
of the database. To allow for some errors, a "windowing" technique is used,
and a hit is counted when STRINGENCY or more words within a window of WINDOW
words match. These "overlapping" regions are stored, and a second program -
QUICKMATCH - then aligns the identified overlapping database sequences to the
query sequence. 

WINDOW and STRINGENCY determine the sensitivity of the search. WINDOW is the
number of consecutive 20-base sequences checked, STRINGENCY determines the
number of these 20-base sequences that must match to define a "hit".
WINDOW should be less than 1/20 of the search sequence length, though 15 is
large enough for long sequences. STRINGENCY should be about half of WINDOW,
though smaller values can be used in many cases. This allows for several
single-base differences between the sequences being compared.

The alignment algorithm used by QUICKMATCH is determined by the BEST parameter. 
By default, a Needleman-Wunsch alignment [3] is performed that finds the best
global alignment between two sequences. The scores (so called "ratio") are
calculated by using a comparison table that scores 1 for matches and 0 for
mismatches, with gap weights of 5 and gap length weights of 0.1.

If the BEST command is used in Mail-QUICKSEARCH then, instead of looking for
the best global alignment, the best local alignments are calculated by a
Smith-Waterman algorithm [2]. The comparison table used scores 1 for matches
and -0.9 for mismatches, with gap weights of 5 and gap length weights of 0.1.


Limitations and Considerations
------------------------------

Repetitive sequences or stretches of one nucleotide may strongly influence the 
results of a QUICKSEARCH run. Make sure to remove such regions from your query 
sequence.

You can use short query sequences but QUICKSEARCH was designed to look for
sequences longer than 200 bases. 

Problems
--------

Please send any inquiries, questions or comments to
                             NETHELP@EMBL-Heidelberg.DE


Related services
----------------

QUICKSEARCH was developed for rapid searching of databases for identical or
closely related DNA sequences. Therefore it is most useful for comparing newly
determined unknown DNA sequences to the database to find out whether there are
similar sequences already existing in the database.

QUICKSEARCH is not suited for the detection of more distantly related
sequences and cannot search for protein sequences. For this purpose use FASTA
instead. An introduction to the EMBL Mail-FASTA service can be obtained by
sending a mail message to FASTA@EMBL-Heidelberg.DE containing the line 
HELP


The software
------------

Our service is based on modifications of the original QUICKSEARCH and QUICKSHOW
programs in the GCG package. These new versions called NEWQUICKSEARCH and
QUICKMATCH were written by Peter Rice, EMBL (RICE@EMBL-Heidelberg.DE). They are
available free of charge from the EMBL File Server by sending a mail message
containing the command
GET VAX_SOFTWARE:GCGQUICK.UUE
to NETSERV@EMBL-Heidelberg.DE

If you are new to the EMBL File Server you can obtain introductory help by
sending a mail file containing
HELP
HELP SOFTWARE
HELP VAX_SOFTWARE
to NETSERV@EMBL-Heidelberg.DE


Literature
----------

[1] Devereux, J., Haeberli, P. and Smithies, O. A comprehensive set of sequence
    analysis programs for the VAX. Nucl. Acids Res. 12:387-395(1984).

[2] Smith, T.F. and Waterman, M.S. Identification of common molecular
    subsequences. J. Mol. Biol. 147:195-197(1981).

[3] Needleman, S.B. and Wunsch, C.D. A general method applicable to the search
    for similarities in the amino acid sequence of two proteins. J. Mol. Biol.
    48:443-453(1970).

[4] Devereux, J. (Ph. D. thesis): A rapid method for identifying sequences in
    large nucleotide sequence databases. 1988 (reprints available from GCG)
________________________________________