===============================
The PredictProtein email server
===============================
An electronic mail service to which you send an amino acid sequence
and which sends back by return mail a multiple sequence alignment
and a secondary structure prediction.
The multiple sequence alignment is performed by a weighted dynamic
programming method, called MaxHom (R.Schneider and C.Sander), and the
secondary structure prediction is produced by the profile network
method, called PHD (B.Rost and C.Sander). The prediction method is
rated at 71.4% average accuracy for water-soluble globular proteins, in
the three states helix, strand, and loop.
Send sequences to: PredictProtein@embl-heidelberg.de
Send questions to: Predict-Help@embl-heidelberg.de
Protein Design Group
European Molecular Biology Laboratory
Heidelberg, Europe
Example
=======
For example, you send the following file:
Joe Sequencer, Department of Advanced Protein Research,
National Univeristy, Timbuktu
joe@amino.churn.edu
# incredulase from paracoccus dementiae, translated from cDNA
KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKD
WWKVEVNDRQGFVPAAYVKKLD
If your sequence has at least one non-trivial homologue in the database of
protein sequences, you then receive a multiple sequence alignment and the
annotated secondary structure prediction:
Block with multiple sequence alignment.
Block with explanations about the prediction method.
.........1.........2.........3.........4.........5.........6
AA |KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPAAYVKKLD|
PHD| EEEEEE EEEEEE EEEEEE EEEE EEE |
Rel|854777641334566643102441577762566642443213663122112234155
The prediction is made by a new method rated at an expected 71.4% average
accuracy for the three states helix, strand and loop (Rost and Sander, PNAS,
submitted). For proteins with no homologues in the training set, the method
is rated at 5-6 percentage points higher three-state accuracy than any
previously published programmed prediction method.
PHD has three main features:
- its improved accuracy results from the use of evolutionary information
in the form of multiple sequence alignments
- it has a much improved beta-strand prediction as a result of balanced
training
- it has a more realistic distribution of segment lengths
Who are we?
===========
Burkhard Rost wrote the prediction program PHD.
Reinhard Schneider wrote the multiple sequence alignment program MaxHom.
Chris Sander helped in the development of the methods.
Papers to reference in reporting results
========================================
Multiple sequence alignment (MaxHom, HSSP):
Sander, Chris; Schneider, Reinhard: Database of Homology-Derived
Structures and the Structural Meaning of Sequence
Alignment.
Proteins, 1991, Vol. 9, pp. 56-68.
Profile network prediction (PHD):
Rost, Burkhard; Sander, Chris: Prediction of protein structure
at better than 70% accuracy.
J. Mol. Biol., 1993, Vol. 232, pp. 584-599.
Further publications on the subject
===================================
HSSP:
Schneider, Reinhard; Sander, Chris: The HSSP data base of protein
structure-sequence alignment.
Nucl. Acids Res., 1993, in press.
Prediction:
Rost, Burkhard; Sander, Chris: Jury returns on structure prediction.
Nature, Vol. 360, p. 540.
Rost, Burkhard; Sander, Chris: Improved prediction of protein
secondary structure by use of sequence profiles and
neural networks.
Proc. Natl. Acad. Sci. U.S.A., 1993, Vol. 90, 7558-7562.
Input format
============
Make up a file with the following content:
- Your name, institution and address (one or more lines)
- Your email address (one line)
- Any additional relevant information (one of more lines)
- A oneline description of the protein sequence submitted that must start
with the symbol "#".
- The amino acid sequence in one letter code (one or more lines of any length,
letters or blanks only, no special symbols or numbers, NOTHING may follow
your sequence).
PredictProtein insists that you adhere to this format. Your input file will
in general only be read by a computer program. Any additional messages to be
read by one of us in person should be sent to:
Predict-Help@EMBL-Heidelberg.de
Ouput format
============
The output format is self-documenting.
The first part is in the 'HSSP' format. It contains a list of the reliable
homologues found in the protein sequence database and the multiple sequence
aligment with residue number running vertically down the page. The lines in
this section may be line-wrapped by the email system.
The second part contains important information about the strengths and
limitations of the prediction method. Please read this carefully to avoid
any misunderstandings. The secondary structure prediction is near the end
of the file: (1) prediction for all residues, with an expected (average)
three-state accuracy of 71.4% (2) prediction for reliably scored residues
only, with an expected three-state accuracy for these residues of 82% (3)
prediction for most reliably scored residues only, with an expected three-
state accuracy for these residues of 92%.
Protection of your sequence data
================================
Your sequence data is deposited in files to which only the PHD authors
have read access. No one else inside or outside of EMBL can read your
data, except our systems administrators who are held to the highest
standards of confidentiality. We occasionally look at the data in
order to assess the performance of PHD. If we note anything of
interest, you will be informed and we will not make any information
public without your permission.
Restrictions
============
Do not use for the membrane-surrounded part of membrane proteins, such
as bacteriorhodospin. The method was trained on water-soluble proteins
and will generally fail for lipid-soluble structures.
If there are few homologous protein sequence in the database with various
levels of sequence similarity to the one you sent, the prediction might
not be superior to other methods.
In the multiple sequence alignment returned to you, only homologues down to
30% identity over 80 or more residues are included. This cutoff is 5
percentage point above the threshold for structural homology (Sander and
Schneider, 1990), in an attempt to stay clear of the twilight zone of
sequence similarity.
Rights
======
All rights reserved, including copyrights.
No commercial use without a license. Contact Chris Sander for
licensing information.
Feedback
========
Address questions or feedback by email to Predict-Help@EMBL-Heidelberg.de,
or by fax to Predict-Help at +49-6221-387517.
The end.
________________________________________