Genome Information Research Center, Osaka Univ.

PredictProtein e-mail server

===============================
The PredictProtein email server
===============================

An electronic mail service to which you send an amino acid sequence
and which sends back by return mail a multiple sequence alignment
and a secondary structure prediction.

The multiple sequence alignment is performed by a weighted dynamic
programming method, called MaxHom (R.Schneider and C.Sander), and the
secondary structure  prediction is produced by the profile network
method, called PHD (B.Rost and C.Sander). The prediction method is
rated at 71.4% average accuracy for water-soluble globular proteins, in
the three states helix, strand, and loop.

Send sequences to: PredictProtein@embl-heidelberg.de
Send questions to: Predict-Help@embl-heidelberg.de

Protein Design Group
European Molecular Biology Laboratory
Heidelberg, Europe


Example
=======

For example, you send the following file:


Joe Sequencer, Department of Advanced Protein Research,
National Univeristy, Timbuktu
joe@amino.churn.edu
# incredulase from paracoccus dementiae, translated from cDNA
KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKD
WWKVEVNDRQGFVPAAYVKKLD


If your sequence has at least one non-trivial homologue in the database of
protein sequences, you then receive a multiple sequence alignment and  the
annotated secondary structure prediction:


Block with multiple sequence alignment.
Block with explanations about the prediction method.
    .........1.........2.........3.........4.........5.........6
AA |KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPAAYVKKLD|
PHD|  EEEEEE                EEEEEE     EEEEEE    EEEE  EEE   |
Rel|854777641334566643102441577762566642443213663122112234155


The prediction is  made by  a new method rated at an expected 71.4% average
accuracy for the three states helix, strand and loop (Rost and Sander, PNAS,
submitted). For proteins with no homologues in the training set, the method
is rated at 5-6 percentage  points  higher  three-state accuracy  than  any
previously published programmed prediction method.

PHD has three main features:

- its improved accuracy results from the use of evolutionary information
  in the form of multiple sequence alignments
- it has a much improved beta-strand prediction  as a result of balanced
  training
- it has a more realistic distribution of segment lengths


Who are we?
===========

Burkhard Rost wrote the prediction program PHD.
Reinhard Schneider wrote the multiple sequence alignment program MaxHom.
Chris Sander helped in the development of the methods.


Papers to reference in reporting results
========================================

Multiple sequence alignment (MaxHom, HSSP):

	Sander, Chris; Schneider, Reinhard: Database of Homology-Derived
		Structures and the Structural Meaning of Sequence
		Alignment.
		Proteins, 1991, Vol. 9, pp. 56-68.

Profile network prediction (PHD):

	Rost, Burkhard; Sander, Chris: Prediction of protein structure
		at better than 70% accuracy.
		J. Mol. Biol., 1993, Vol. 232, pp. 584-599.


Further publications on the subject
===================================

HSSP:
	Schneider, Reinhard; Sander, Chris: The HSSP data base of protein
		structure-sequence alignment.
		Nucl. Acids Res., 1993, in press.

Prediction:
	Rost, Burkhard; Sander, Chris: Jury returns on structure prediction.
		Nature, Vol. 360, p. 540.

	Rost, Burkhard; Sander, Chris: Improved prediction of protein
		secondary structure by use of sequence profiles and
		neural networks.
		Proc. Natl. Acad. Sci. U.S.A., 1993, Vol. 90, 7558-7562.


Input format
============

Make up a file with the following content:

- Your name, institution and address (one or more lines)
- Your email address (one line)
- Any additional relevant information (one of more lines)
- A oneline description of the protein sequence submitted that must start
  with the symbol "#".
- The amino acid sequence in one letter code (one or more lines of any length,
  letters or blanks  only, no special symbols or numbers,  NOTHING may follow
  your sequence).

PredictProtein insists that  you adhere to this format. Your input file will
in general only be read by a computer program. Any additional messages to be
read by one of us in person should be sent to:

        Predict-Help@EMBL-Heidelberg.de


Ouput format
============

The output format is self-documenting.

The first part is in the 'HSSP' format.  It contains a list of the reliable
homologues found in the protein sequence database and the multiple sequence
aligment with residue number running vertically down the page. The lines in
this section may be line-wrapped by the email system.

The second part contains  important  information  about  the  strengths and
limitations of the prediction method.  Please  read this carefully to avoid
any misunderstandings.  The secondary structure prediction is near  the end
of the file: (1) prediction  for  all  residues, with an expected (average)
three-state  accuracy  of 71.4% (2) prediction for reliably scored residues
only,  with  an expected three-state accuracy for these residues of 82% (3)
prediction for  most reliably scored residues only, with an expected three-
state accuracy for these residues of 92%.


Protection of your sequence data
================================

Your sequence data is deposited in files to which only the PHD authors
have read access. No one else inside or outside of EMBL can read your
data, except our systems administrators who are held to the highest
standards of confidentiality. We occasionally look at the data in
order to assess the performance of PHD. If we note anything of
interest, you will be informed and we will not make any information
public without your permission.


Restrictions
============

Do not use for the membrane-surrounded part of membrane proteins, such
as bacteriorhodospin. The method was trained on water-soluble proteins
and will generally fail for lipid-soluble structures.

If there are few homologous protein sequence in the database with various
levels of sequence similarity to the one you sent, the prediction might
not be superior to other methods.

In the multiple sequence alignment returned to you, only homologues down to
30% identity  over  80 or  more  residues  are included.  This  cutoff is 5
percentage point above  the  threshold  for structural homology (Sander and
Schneider, 1990), in  an  attempt to  stay  clear of the  twilight  zone of
sequence similarity.


Rights
======

All rights reserved, including copyrights.
No commercial use without a license. Contact Chris Sander for
licensing information.


Feedback
========

Address questions or feedback by email to Predict-Help@EMBL-Heidelberg.de,
or by fax to Predict-Help at +49-6221-387517.

The end.
________________________________________