=============================== The PredictProtein email server =============================== An electronic mail service to which you send an amino acid sequence and which sends back by return mail a multiple sequence alignment and a secondary structure prediction. The multiple sequence alignment is performed by a weighted dynamic programming method, called MaxHom (R.Schneider and C.Sander), and the secondary structure prediction is produced by the profile network method, called PHD (B.Rost and C.Sander). The prediction method is rated at 71.4% average accuracy for water-soluble globular proteins, in the three states helix, strand, and loop. Send sequences to: PredictProtein@embl-heidelberg.de Send questions to: Predict-Help@embl-heidelberg.de Protein Design Group European Molecular Biology Laboratory Heidelberg, Europe Example ======= For example, you send the following file: Joe Sequencer, Department of Advanced Protein Research, National Univeristy, Timbuktu joe@amino.churn.edu # incredulase from paracoccus dementiae, translated from cDNA KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKD WWKVEVNDRQGFVPAAYVKKLD If your sequence has at least one non-trivial homologue in the database of protein sequences, you then receive a multiple sequence alignment and the annotated secondary structure prediction: Block with multiple sequence alignment. Block with explanations about the prediction method. .........1.........2.........3.........4.........5.........6 AA |KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPAAYVKKLD| PHD| EEEEEE EEEEEE EEEEEE EEEE EEE | Rel|854777641334566643102441577762566642443213663122112234155 The prediction is made by a new method rated at an expected 71.4% average accuracy for the three states helix, strand and loop (Rost and Sander, PNAS, submitted). For proteins with no homologues in the training set, the method is rated at 5-6 percentage points higher three-state accuracy than any previously published programmed prediction method. PHD has three main features: - its improved accuracy results from the use of evolutionary information in the form of multiple sequence alignments - it has a much improved beta-strand prediction as a result of balanced training - it has a more realistic distribution of segment lengths Who are we? =========== Burkhard Rost wrote the prediction program PHD. Reinhard Schneider wrote the multiple sequence alignment program MaxHom. Chris Sander helped in the development of the methods. Papers to reference in reporting results ======================================== Multiple sequence alignment (MaxHom, HSSP): Sander, Chris; Schneider, Reinhard: Database of Homology-Derived Structures and the Structural Meaning of Sequence Alignment. Proteins, 1991, Vol. 9, pp. 56-68. Profile network prediction (PHD): Rost, Burkhard; Sander, Chris: Prediction of protein structure at better than 70% accuracy. J. Mol. Biol., 1993, Vol. 232, pp. 584-599. Further publications on the subject =================================== HSSP: Schneider, Reinhard; Sander, Chris: The HSSP data base of protein structure-sequence alignment. Nucl. Acids Res., 1993, in press. Prediction: Rost, Burkhard; Sander, Chris: Jury returns on structure prediction. Nature, Vol. 360, p. 540. Rost, Burkhard; Sander, Chris: Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl. Acad. Sci. U.S.A., 1993, Vol. 90, 7558-7562. Input format ============ Make up a file with the following content: - Your name, institution and address (one or more lines) - Your email address (one line) - Any additional relevant information (one of more lines) - A oneline description of the protein sequence submitted that must start with the symbol "#". - The amino acid sequence in one letter code (one or more lines of any length, letters or blanks only, no special symbols or numbers, NOTHING may follow your sequence). PredictProtein insists that you adhere to this format. Your input file will in general only be read by a computer program. Any additional messages to be read by one of us in person should be sent to: Predict-Help@EMBL-Heidelberg.de Ouput format ============ The output format is self-documenting. The first part is in the 'HSSP' format. It contains a list of the reliable homologues found in the protein sequence database and the multiple sequence aligment with residue number running vertically down the page. The lines in this section may be line-wrapped by the email system. The second part contains important information about the strengths and limitations of the prediction method. Please read this carefully to avoid any misunderstandings. The secondary structure prediction is near the end of the file: (1) prediction for all residues, with an expected (average) three-state accuracy of 71.4% (2) prediction for reliably scored residues only, with an expected three-state accuracy for these residues of 82% (3) prediction for most reliably scored residues only, with an expected three- state accuracy for these residues of 92%. Protection of your sequence data ================================ Your sequence data is deposited in files to which only the PHD authors have read access. No one else inside or outside of EMBL can read your data, except our systems administrators who are held to the highest standards of confidentiality. We occasionally look at the data in order to assess the performance of PHD. If we note anything of interest, you will be informed and we will not make any information public without your permission. Restrictions ============ Do not use for the membrane-surrounded part of membrane proteins, such as bacteriorhodospin. The method was trained on water-soluble proteins and will generally fail for lipid-soluble structures. If there are few homologous protein sequence in the database with various levels of sequence similarity to the one you sent, the prediction might not be superior to other methods. In the multiple sequence alignment returned to you, only homologues down to 30% identity over 80 or more residues are included. This cutoff is 5 percentage point above the threshold for structural homology (Sander and Schneider, 1990), in an attempt to stay clear of the twilight zone of sequence similarity. Rights ====== All rights reserved, including copyrights. No commercial use without a license. Contact Chris Sander for licensing information. Feedback ======== Address questions or feedback by email to Predict-Help@EMBL-Heidelberg.de, or by fax to Predict-Help at +49-6221-387517. The end.
________________________________________