BLAST E-Mail Server on GenomeNet
BLAST (Basic Local Alignment Search Tool) was developed by the National
Center for Biotechnology Information at the National Library of Medicine
and made available for use on the GenomeNet E-mail server. The BLAST
program employs a heuristic search algorithm to compare a protein or
nucleic acid query sequence against a protein or nucleic acid sequence
database. BLAST compares sequences with databases using an ungapped
alignment algorithm. If you use BLAST as a research tool, we ask that
the following reference be cited in your paper:
S. F. Altschul, W. Gish, W. Miller, E. W. Myers and
D. J. Lipman (1990) J. Mol. Biol. 215, 403-410.
The GenomeNet BLAST server allows you to send a specially formatted mail
message containing the nucleic acid or protein query sequence to the
BLAST server at the Supercomputer Laboratory, Institute for Chemical
Research, Kyoto University. A BLAST search is then performed against
the specified database and the results returned in a mail message.
Accessing the BLAST program
To access the program, send an electronic mail message containing the
formatted query sequence (as described below) to the following Internet
address:
blast@genome.ad.jp
If you are not on Internet, you may need to change the format of the
address. Consult your systems manager to determine the correct
address format.
Obtaining Help
If you would like to receive instructions on using the BLAST program,
send a mail message to the address above containing the word "help" on
a single line. This document is then returned to you in a mail message.
The BLAST manual page is appended to the end of the help text.
Programs using the BLAST algorithm
One of the four programs is to be selected depending on the purpose
of the search.
Designator Purpose
---------- -------
blastp To compare a protein query sequence vs.
a protein sequence database.
blastn To compare a nucleic acid query sequence
vs. a nucleic acid sequence database.
blastx To compare a nucleic acid query sequence
translated in all reading frames vs. a
protein sequence database.
tblastn To compare a protein query sequence vs.
a nucleic acid sequence translated in
all reading frames.
Databases for use with BLAST
The following databases are currently available for BLAST searches:
Designator Database
---------- --------
nr-nt Non-redundant nucleic acid sequence database
constructed from genbank,embl,genbank-upd
genbank GenBank nucleic acid sequence database
(Latest quarterly release)
embl EMBL nucleic acid sequence database
(Latest quarterly release)
nr-aa Non-redundant protein sequence database
constructed from swissprot,pir,prf,genpept,
genpept-upd
swissprot SWISS-PROT protein sequence database
pir PIR protein sequence database
prf PRF protein sequence database
Formatting a Query
Queries consist of a mail message with search parameters identifying
the program, the database to be searched, values related to the search,
and the query sequence to be used in the search. The mail message has
three mandatory lines, optional lines if any, and a line identifying
the query sequence as described below. These lines are typed into the
body of the mail message in the order shown below:
Search
Parameter Mandatory Explanation
--------- --------- -----------
PROGRAM Yes Indicates the program to be used (i.e.,
blastp, blastn, blastx, or tblastn;
see list above).
DATALIB Yes This line specifies the database to be
searched (e.g., genbank, embl, pir, or
swissprot; see list above).
No Each option can be specified in the form of:
=
as described in the attached manual page.
BEGIN Yes This line must be included in the message.
It should be the last parameter and followed
on the next line with the query sequence.
See below for the format of the query
sequence.
The remainder of the message contains the query sequence in FASTA
format. See the sample below.
Preparing Files for Similarity Searches
Only one query sequence is allowed per mail message and your sequence
must be in FASTA format. If your sequence file is in other formats,
it is possible to use an editor to change the file to FASTA format.
The format includes a mandatory comment line beginning with a greater-than
sign ">" followed by the name of the sequence, a space, and an optional
note about the sequence. The sequence data begin on the next line without
the greater-than sign. For example:
>AGREP4 Monkey SV40-like genomic segment promoting transcription.
ccccttcaaatctattacaaggtgagcgtctcgccaaggcaatgaaatcgcaatatgatg
tttccatttactttggattatacgtcattataaa
Sending the Query Sequence
Use your local mail program to send your query sequence. Most
mail programs allow you to import a file containing your sequence into
the mail message. You should import your sequence file into the mail
message on the line after "BEGIN". Please follow the format in the
following example of a BLAST request PRECISELY, but note that the
program is case-insensitive, i.e. either upper or lower case letters
may be used.
Here is an example of a mail message sent for a BLAST search. Note that
the first four lines are a mail header that is automatically created
when you address a mail message. Nothing need be entered for the
Subject. NOTE: the text that you enter into the body of the message
begins with the "PROGRAM" keyword below (do not add blank lines in the
message). Each line of information must be less than 80 characters in
length. Longer lines may be truncated.
From: somebody@someaddress.somewhere.ac.jp Wed May 27 16:34:06 1992
Date: Wed, 27 May 92 16:38:02 JST
To: blast@genome.ad.jp
Subject:
PROGRAM blastn
DATALIB genbank
B=20
BEGIN
>BOVPRL GenBank entry BOVPRL from gbmam file.907 nucleotides.
tgcttggctgaggagccataggacgagagcttcctggtgaagtgtgtttcttgaaatcat
caccaccatggacagcaaa
The example above uses the three mandatory keyword lines:
PROGRAM, DATALIB, and BEGIN, and an optional line specifying the
maximum number of high-scoring segment pairs to be reported.
Handling the Results of a BLAST Search
When the results are returned, use your local mail program to view
them. You can transfer the results of a BLAST search to a separate
disk file to free up space in your mail directory. Consult the
documentation for your local mail program for the commands to read
and transfer mail.
Retrieving Individual Entries Found in BLAST Searches
Database entries can be retrieved by either entry name or accession
number. To use the GenomeNet database retrieval server, send an electronic
mail message to:
dbget@genome.ad.jp
To get started, send a mail message to the address above containing the
word "help" on a single line in the body of the mail message.
Last Update: 93/04/08
==========================================================================
BLAST Manual Page
BLAST(1) USER COMMANDS BLAST(1)
NAME
blastp, blastn, blastx, tblastn - rapid sequence database
query programs using the BLAST algorithm
SYNOPSIS
blastp aadb aaquery [E=#] [S=#] [E2=#] [S2=#] [W=#] [T=#] [X=#]
[M=subfile] [Y=#] [Z=#] [K=#] [L=#] [H=#] [V=#] [B=#]
blastn ntdb ntquery [E=#] [S=#] [W=#] [X=#] [M=#] [N=#] [Y=#] [Z=#]
[K=#] [L=#] [H=#] [V=#] [B=#] [[top][bottom]]
blastx aadb ntquery [E=#] [S=#] [W=#] [T=#] [X=#] [M=subfile]
[Y=#] [Z=#] [C=#] [K=#] [L=#] [V=#] [B=#]
[[top][bottom]]
tblastn ntdb aaquery [E=#] [S=#] [E2=#] [S2=#] [W=#] [T=#] [X=#]
[M=subfile] [Y=#] [Z=#] [C=#] [K=#] [L=#]
[H=#] [V=#] [B=#] [[top][bottom]]
DESCRIPTION
BLAST (Basic Local Alignment Search Tool) is the heuristic
search algorithm employed by the programs blastp, blastn,
blastx, and tblastn. The four programs are used for the
following purposes:
blastp
to compare an amino acid query sequence vs. a protein
sequence database;
blastn
to compare a nucleotide query sequence vs. a nucleotide
sequence database;
blastx
to compare a nucleotide query sequence translated in
all reading frames vs. a protein sequence database;
tblastn
to compare a protein query sequence vs. a nucleotide
sequence database dynamically translated in all reading
frames.
Whenever a nucleotide query sequence or nucleotide database
is involved, both strands (or all 6 reading frames) are
searched by default. The "top" and "bottom" options may be
used to restrict a search to the specified strand. (If both
options are specified, both strands will be searched).
The unit of BLAST algorithm output is the High-scoring Seg-
ment Pair (HSP), where a segment is an arbitrarily long run
of contiguous residues. In the programmatic implementations
of the algorithm described here, an HSP is a pair of
Sun Release 4.1 Last change: 29 December 1991 1
BLAST(1) USER COMMANDS BLAST(1)
segments, one from the query sequence and one from a data-
base sequence, where the score of their ungapped alignment
meets or exceeds a parametrized, positive-valued cutoff. A
set of zero or more HSPs is thus defined by two sequences, a
scoring scheme, and a cutoff score.
A Maximal-scoring Segment Pair (MSP) is defined by two
sequences and a scoring scheme and is the highest-scoring of
all segment pairs on all diagonals. Depending on the param-
eters of a BLAST sequence comparison, there may be a non-
zero probability of not finding one or more HSPs of which
the MSP is a member.
PARAMETERS
Parameters are modified using a name=value syntax, e.g.,
E=0.05 or S=100.
E is interpreted as the expected number of MSPs that will
satisfy the cutoff score under the random sequence model.
The value of E approximates the expected number of HSPs that
will be found during the course of an entire database
search. The default value for E is 10, and the permitted
range for this real valued parameter is 0. < E <= 1000.
S is the cutoff score for reporting HSPs. Higher scores
correspond to increasing statistical significance (lower
probability or reduced expected frequency of occurrence).
Any positive-scoring alignments which the programs find but
which score below S are not reported. Unless S is expli-
citly set on the command line, its default value is calcu-
lated from the value of E.
The values for E and S are interconvertable, a process which
is dependent on the following factors: the length and resi-
due composition of the query sequence; the length of the
database and a fixed, hypothetical residue composition for
it; and the scoring scheme employed. The scoring scheme
used by blastp, blastx, and tblastn is a substitution
matrix; the scoring scheme used by blastn is a positive
reward score for matching residues and a negative penalty
score for mismatched residues.
When both of the parameters E and S are specified on the
command line, the one resulting in the highest (most res-
trictive) cutoff score will be used. When neither of these
parameters is specified on the command line, the default
value for E is used to calculate the cutoff score.
For a given value of E (e.g., the default value of 10), a
given query sequence, and a single scoring scheme, the cal-
culated value of the cutoff score S will be different when
searching databases of different lengths. To normalize the
Sun Release 4.1 Last change: 29 December 1991 2
BLAST(1) USER COMMANDS BLAST(1)
statistics reported when databases of different lengths are
searched, the parameter Z (see below) may be set to a con-
stant value for all database searches.
S takes on only integral values in the present implementa-
tions of the BLAST algorithm. When the cutoff score is set
implicitly via E, S is rounded to the least integral value
required to satisfy E. Since the rounding procedure can
decrease the effective value of E, the calculated value for
S is used to back-calculate the effective value for E. For
example, if the user specifies E = 50 on the command line, a
cutoff score that is rounded up by 0.9 units to the smallest
satisfying integer might correspond to an expected number of
HSPs of only 43. In this case, the value displayed for E at
the end of the program's report will be 43.
When at least one HSP is found involving any given database
sequence, the programs blastp and tblastn search the data-
base sequence a second time for HSPs that satisfy a lower
cutoff score, S2. In essence, the second-pass search gives
these programs the opportunity to report any low-
significance HSPs they may have found that might be of
interest within the context of finding one or more higher-
scoring (perhaps statistically significant) HSPs. Poisson
statistics may indicate that the lower-scoring (higher-
probability) HSPs are statistically significant when their
frequencies of occurrence are considered.
In a relationship similar to that between the parameters E
and S, S2 can be set explicitly on the command line or it
will be calculated from the setting of E2. Whereas S is
related to E by the size of the database and the length of
the query sequence, S2 is related to E2 by the lengths of a
pair of hypothetical protein sequences of 300 residues each.
In other words, E2 approximates the number of HSPs one would
expect to find when comparing two protein sequences of
length 300, one having the composition of the query sequence
and the other having the hypothetical residue composition of
the database. If a second-pass search is not desired, set-
ting E2 to zero (0) turns this feature off. If S2 happens
to be equal to or greater than the primary cutoff score, a
second-pass search is not performed, as well.
The user should be forewarned that, with no other knowledge
about a positive-scoring segment pair than its score, the
chance that the BLAST algorithm will not find the alignment
increases as the score of the alignment decreases. Conse-
quently, the low-scoring HSPs looked for in the second-pass
search have a smaller chance individually of being found.
With a fixed scoring scheme, the probability of missing an
alignment can be decreased by: lowering the neighborhood
Sun Release 4.1 Last change: 29 December 1991 3
BLAST(1) USER COMMANDS BLAST(1)
word-score threshold, T, while keeping the word size, W,
constant; lowering both W and T appropriately (see Altschul
et al., 1990); and/or raising the word-hit-extension drop-
off score X (described below).
W is the word size for finding initial hits against the
database sequences. Each hit is extended in both directions
along the corresponding diagonal of an imaginary 2-
dimensional matrix until the segment score drops off by at
least the quantity X. The default value for W is 3 amino
acids for blastp, blastx, and tblastn, and 12 nucleotides
for blastn. The value of W used by blastn should not be
changed, as the logic of the program source code has not
been validated for use with values other than the default.
For the other programs, which perform sequence comparisons
at the level of individual amino acids, W should generally
be restricted to values less than 5 or else the value for T
should be specified disproportionately larger to avoid con-
suming vast quantities of memory for the neighborhood word
list (see below).
T is the word score threshold for generating neighborhood
words of length W from the query sequence, prior to scanning
the database (blastp, blastx, and tblastn only). Words
which have an aggregate score (through summation of the
individual residue substitution scores) of at least T when
aligned with words from the query sequence are included in
the neighborhood list. Raising the value of T increases the
likelihood of completely missing HSPs, but can decrease the
search time and memory requirements of the programs by
decreasing the size of the neighborhood list. One of the
key (but not unique) features of the BLAST algorithm is the
user-selectable trade-off in sensitivity for speed.
A generally suitable value for T is calculated at run-time,
using the residue composition and length of the query
sequence and the substitution matrix employed. The neigh-
borhood word-score threshold is set using an ad hoc equation
that is a function of Lambda and H. Lambda is the number of
nats of information gained per unit increase in score of an
alignment (approximately 0.69315 times the number of bits
per unit score); H is the relative entropy of the target and
background residue frequencies [Karlin and Altschul, 1990],
or the expected information available per position in an
alignment to distinguish it from chance.
The supplied PAM120 amino acid substitution matrix, with a
scale of ln(2)/2, yields a value for Lambda that is close to
0.5 bit per unit score for query sequences of typical resi-
due compositions. Occasionally it may be necessary to manu-
ally set the neighborhood word-score threshold via the com-
mand line, for which 13 may be a good value to try, but this
Sun Release 4.1 Last change: 29 December 1991 4
BLAST(1) USER COMMANDS BLAST(1)
is highly dependent on the substitution matrix and word-
length, W, being employed.
X is a positive integer representing the maximum permissible
drop-off of the cumulative segment score during word-hit
extension. Raising X may decrease the chance that the BLAST
algorithm overlooks an HSP, but it may significantly
increase the search time, as well. If computation time is
of little concern, X might be increased several points from
its default value, but only a very marginal increase in sen-
sitivity might be expected.
For blastp, blastx, and tblastn, the default value of X is
calculated to be the minimum integral score representing at
least 10 bits of information, or a reduction in the statist-
ical significance of the alignment by a factor of 2 to the
10th power (about 1,000). For blastn, the default value of
X is the minimum integral score that represents at least 20
bits of information, or a reduction in the statistical sig-
nificance of the alignment by a factor of 2 to the 20th
power (about one million).
The command line parameters K and L can be used to set fixed
values for the Karlin statistics' K and lambda parameters,
respectively. Users should generally avoid setting these
parameters unless the full ramifications of doing so are
understood. For example of one of the less obvious effects
of manually choosing these parameters, the value of the H
statistic reported at the end of each program's output
(which is distinct from the command line parameter of the
same name) is a function of lambda; and the default value
for the neighborhood word-score threshold parameter T is in
turn a function of H.
SCORING SCHEMES
With blastp, blastx, and tblastn, the M option can be used
to select an alternate substitution matrix file. The
default PAM120 matrix is recommended for general protein
similarity searches (Altschul, 1991). While only the PAM120
and the PAM250 matrices are provided, the pam(1) program can
be used to produce PAM matrices of any desired generation
from 2 to 511. For rigorous searches where the mutational
distance between potential homologs is unknown, Altschul
(1991) recommends performing three searches, one each with
the PAM-40, PAM-120, and PAM-250 matrices.
In blastn, M is the score for a single-letter match; N is
the score for a single-letter mismatch. M and N must be
positive and negative integers, respectively. Given the
assumption made by blastn that the 4 nucleotides A, C, G,
and T are represented equally in the database, the expected
score for the query sequence must be negative.
Sun Release 4.1 Last change: 29 December 1991 5
BLAST(1) USER COMMANDS BLAST(1)
SEQUENCE LENGTH AND STATISTICAL SIGNIFICANCE
For the purpose of calculating significance levels, Y is the
effective length of the query sequence and Z is the effec-
tive length of the database, both measured in residues. The
default values for these parameters are the actual lengths
of the query sequence and database, respectively. Larger
values signify more degrees of freedom for aligning the
sequences and reduced statistical significance for an align-
ment of any given score.
GENETIC CODES
C is a non-negative integer that determines the genetic code
that will be used by blastx (tblastn) to translate the query
sequence (database sequences). The default genetic code
(C=0) corresponds to the so-called Standard or Universal
genetic code. To obtain a listing of the nine available
genetic codes and their associated numerical identifiers,
invoke either blastx or tblastn with the command line param-
eter C=list.
The current list of genetic codes and their associated
values for parameter C are:
0 Standard or Universal
1 Vertebrate Mitochondrial
2 Yeast Mitochondrial
3 Mold Mitochondrial and Mycoplasma
4 Invertebrate Mitochondrial
5 Ciliate Macronuclear
6 Protozoan Mitochondrial
7 Plant Mitochondrial
8 Echinodermate Mitochondrial
POISSON STATISTICS
The occurrence of two or more HSPs involving the query
sequence and the same database sequence is modeled as a
Poisson process. An important result of applying Poisson
statistics is that an HSP with a low score and high Expect
value (low significance) may be discovered to be statisti-
cally significant when appearing in the context of one or
more additional matches of equal or higher score against the
same database sequence.
Sun Release 4.1 Last change: 29 December 1991 6
BLAST(1) USER COMMANDS BLAST(1)
The Poisson P-value for any given HSP is a function of its
expected frequency of occurrence and the number of HSPs
actually observed with scores at least as high. The Poisson
P-value for a group of HSP events is the probability that at
least as many HSPs would occur by chance, each with a score
at least as high as the lowest-scoring member of the group.
HSPs which appear on opposite strands of a nucleotide query
or database sequence are considered independent and distin-
guishable events, and so are counted separately.
Given the score of an HSP, when the expected length for an
alignment with that score (see the description of H above)
is a significant fraction of the length of the query
sequence, the Expect value used in estimation of the Poisson
P-value is reduced proportionately.
P-VALUES, ALIGNMENT SCORES, AND INFORMATION
The Expect and P-values of HSPs reported by the programs are
dependent on numerous factors including: the scoring scheme
employed, the residue composition of the query sequence, an
assumed residue composition for a typical database sequence,
and the query and database lengths. Independent of the
query and database lengths are the HSP scores themselves,
which may be readily compared between different program runs
even if the databases searched are of different lengths, as
long as all of the other relevant factors listed here were
unchanged.
Further isolation from the many variables of a search in
one's assessment of an HSP may be obtained by observing the
information content reported (in bits) for the alignments.
While the information content of an HSP may change when fun-
damentally different scoring schemes are used (e.g., dif-
ferent generations of PAM matrices), the number of bits
reported for an HSP will be independent of the scales to
which the matrices were generated. (In practice, this
statement is not quite true because the substitution scores
used by these programs are floating point or real values
which have been rounded to nearest integers and thus lack a
high degree of precision). When communicating the statisti-
cal significance of an alignment, the alignment score itself
is generally not so important as the combination of the sub-
stitution matrix employed and the actual information content
of the alignment.
REGULATING OUTPUT
The output is categorized into three sections: a histogram
of word-hit extension scores; one-line descriptions of the
database sequences that yielded one or more HSPs; and the
high-scoring segment pairs themselves. Each section of the
output can be selectively suppressed by setting the parame-
ters H, V, and B to 0 (zero).
Sun Release 4.1 Last change: 29 December 1991 7
BLAST(1) USER COMMANDS BLAST(1)
Parameter H regulates the display of an histogram of the
scores of the highest-scoring hit extensions for each data-
base sequence. As long as H has a non-zero value, the his-
togram will be displayed (except for the blastx program,
which never displays an histogram but retains the H parame-
ter for command-line compatibility with the other programs).
The default value for H is 1.
Parameter V is the maximum number of database sequences for
which one-line descriptions will be reported. The default
value for V is 500. A warning message is prominently
displayed at the end of the one-line descriptions section
when HSPs are found in more than V sequences. When V is
zero, no one-line descriptions are reported and no warning
is given. Negative values for V are undefined and disal-
lowed.
As an example of how V can be used advantageously, if a high
value for E is desired to virtually assure in all cases that
at least one HSP will be found, selecting a small value for
V will ensure that the output will not be too voluminous;
only the most statistically significant matches will be
reported.
Parameter B regulates the display of the high-scoring seg-
ment pairs. For positive values, B is the maximum number of
database sequences for which high-scoring segment pairs will
be reported. This may be much smaller than the actual
number of high-scoring segment pairs reported, since any
given database sequence may yield several HSPs. The default
value for B is 250. Negative values for B are undefined and
disallowed.
SUPPORT UTILITIES
Databases to be searched by these programs must first be
processed by the program setdb for protein sequence data-
bases (re: blastp and blastx) and the program pressdb for
nucleotide sequence databases (re: blastn and tblastn).
Point accepted mutation (PAM) matrices of various genera-
tions can be produced automatically with the pam program.
The output can be saved in a file whose name is then speci-
fied in the M=filename option of a blastp, blastx, or
tblastn query.
BUGS
blastn uses a large value for the wordlength, W, and does no
neighboring on these words. Consequently, the program is
suitable for finding nearly identical sequences rapidly. To
identify weak amino acid similarities encoded by nucleic
acid, use blastx or tblastn.
Sun Release 4.1 Last change: 29 December 1991 8
BLAST(1) USER COMMANDS BLAST(1)
In blastp, blastx, and tblastn, ad hoc equations have not
been implemented yet for calculating appropriate default
values for T when W has a value other than 3 or 4.
When nucleotide sequence databases are processed into
searchable form by the pressdb program, IUPAC ambiguity
letters are replaced by an appropriate random selection from
the list A, C, G and T. (For example, an R would be replaced
on the average half of the time by an A and half of the time
by a G). Similarly, blastn replaces ambiguity letters in
the query sequence with appropriate random selections. Only
after an HSP is found that satisfies the cutoff score are
the original sequences with their ambiguities intact exam-
ined. With blastn, the alignment score will decrease and
may consequently fall below the cutoff score if the random
replacement letter happened to match. With blastx and
tblastn, the outcome will depend upon whether a specific
amino acid can be inferred despite the ambiguity.
tblastn uses only one genetic code to translate the entire
nucleotide sequence database, although the particular
genetic code employed is selectable via the parameter C.
blastn, blastx, and tblastn treat U and T residues in
nucleotide sequences the same.
With one exception, any letter in the query sequence which
is not a member of the relevant IUPAC amino acid or nucleo-
tide code is stripped and does not contribute to the
sequence coordinate numbers reported by the programs. The
exception is asterisks (*) in amino acid sequences, which
are interpreted as translation stops. In protein sequence
databases that are processed into searchable form by the
setdb program, non-IUPAC letters, including any punctuation
but excluding asterisks, are also stripped. The pressdb
program does not strip non-IUPAC codes, but treats them
similarly to Ns.
blastn does not incorporate the concept of a partial- or
half-match, such as when a purine in one sequence is juxta-
posed with a purine from the other. For two residues to
match at all, they both must be members of the set A, C, G
and T (or U).
When calculating the Poisson statistics, some HSPs may be
incompatible with each other (not all of them may be simul-
taneously alignable without reusing some portion of either
sequence) and yet they are (incorrectly) counted as indepen-
dent events.
The user may note that the nucleotide composition of a
blastn query sequence is irrelevant to the resulting Karlin
Sun Release 4.1 Last change: 29 December 1991 9
BLAST(1) USER COMMANDS BLAST(1)
parameters, Lambda and K. This is due to the residue compo-
sition assumed for a typical database sequence being 25% for
each of the four nucleotides A C, G and T. The values of the
Karlin parameters are still affected by the scoring scheme
employed. Furthermore, the individual who compiles these
programs is certainly not barred from setting a non-uniform
residue composition for the database sequences, in which
case the query composition is relevant and will be reflected
in the Karlin parameters calculated by blastn.
SEE ALSO
blast3(1).
REFERENCES
Karlin, Samuel and Stephen F. Altschul (1990). Methods for
assessing the statistical significance of molecular sequence
features by using general scoring schemes, Proc. Natl. Acad.
Sci. USA 87:2264-2268.
Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W.
Myers, and David J. Lipman (1990). Basic local alignment
search tool, J. Mol. Biol. 215:403-410.
Altschul, Stephen F. (1991). Amino acid substitution
matrices from an information theoretic perspective. J. Mol.
Biol. 219:555-565.
Sun Release 4.1 Last change: 29 December 1991 10
________________________________________