HELP [BLITZ] May 12, 1993 Introduction ------------ BLITZ is an automatic electronic mail server for the MPsrch program of Shane Sturrock and John Collins, Biocomputing Research Unit, University of Edinburgh, Scotland [1]. MPsrch allows you to perform sensitive and extremely fast comparisons of your protein sequences against the Swiss-Prot protein sequence database using the Smith and Waterman best local similarity algorithm [2]. It runs on the MasPar family of massively parallel machines; the BLITZ server uses a 4096-processor MasPar MP-1 system. A typical search time for a query sequence of 400 amino acids is approximately 40 seconds to search the entire Swiss-Prot 23 release. Additional time is required to reconstruct the alignments; the time for this will depend on the number of alignments requested. MPsrch is the fastest implementation of the SW algorithm currently available on any machine. This documentation describes how to use the EMBL BLITZ server to submit MPsrch jobs. The original MPsrch documentation is included in an appendix at the end of this file. MPsrch answers the question: which sequences in the database are most similar (or contain the most similar regions) to my query sequence. How to use BLITZ ---------------- Send a properly formatted electronic mail message to BLITZ@EMBL-Heidelberg.DE containing some of the commands listed below and the answer will be automatically mailed to you. If you have any problems using the BLITZ service, or any questions, please send them to: NETHELP@EMBL-Heidelberg.DE Example ------- Here is a quick summary example (the parameters are explained below): TITLE RPC1_Lambd this is a test using the Lambda CI repressor from Swiss-Prot. PAM 200 INDEL 10 ALIGN 50 SEQ STKKKPLTQE QLEDARRLKA IYEKKKNELG LSQESVADKM GMGQSGVGAL FNGINALNAY NAALLAKILK VSVEEFSPSI AREIYEMYEA VSMQPSLRSE YEYPVFSHVQ AGMFSPELRT FTKGDAERWV STTKKASDSA FWLEVEGNSM TAPTGSKPSF PDGMLILVDP EQAVEPGDFC IARLGGDEFT FKKLIRDSGQ VFLQPLNPQY PMIPCNESCS VVGKVIASQW PEETFG END Databases available ------------------- At present, only the latest release of the SwissProt protein database is available. This is updated 4 times a year. In the future, SwissProt entries will be available to be searched between releases as a seperate cumulative database. The Input Format ---------------- BLITZ is an automatic server that runs without any human intervention. Therefore, it understands a limited set of commands. These are listed below. Defaults are offered for all of the parameters used in the search except the search sequence itself. Here are some general rules: - Your mail message must contain only one command per line. - There is only one mandatory command, SEQ, unless you just want to get this help file. All the search parameters are optional, and default values will be used whenever they are not specified. - You can use both uppercase and lowercase characters, or mix them. - The order of the commands is not important, but SEQ is usually the last one, since everything following this line will usually be treated as a sequence (see below). - Blank lines or space characters are accepted. - Only one search per mail message is allowed. Here is a list of valid commands that are accepted by BLITZ: HELP Use this command to request this help document. PAM n PAM matrix where "n" is a number between 1 and 500. This (protein only) command sets the amino acid weight matrix that is used to score non-identical amino acids in the search. They are usually known as "Dayhoff matrices" [3]. The default value of 120 is used if you do not specify a PAM matrix. Initially, just use the default value i.e. leave this command out. Using different PAM values can often have a dramatic effect on which sequences you find in a search. A more detailed description of why this is so is given at the end of this document. INDEL n Indel or gap penalty where "n" is a small integer, typically in the range 5 to 30. If you omit this parameter, a default will be used that depends on the PAM matrix chosen. The default indel cost for a PAM matrix setting of 120 is 13 and must not be set below 7. If you give an illegal value for indel cost, a suitable default will be used. The indel cost actually used in any search will be printed in the output file so that you can change it if desired. Initially just omit this parameter. Decrease this parameter to encourage gaps; increase it to discourage them. ALIGN n Number of best alignments to see; default is 30; maximum is 100. NAMES n Number of scores to report (can be more than ALIGN) TITLE s A one-line title for the search where "s" is text. The first word will be used as the "name" of the query sequence and the rest of the line will be used as a description. e.g. TITLE my_sequence test using PAM 120 of my sequence will use "my_sequence" as the name and "test using PAM 120 of my sequence" as description. Do not use quotes or double quotes in the title! SEQ (MANDATORY) *Everything* following this line up to a) the end of the mail message, or b) a line starting with the word END will be treated as part of the sequence. Don't put sequence information on the same line on which you put the SEQ command or the END command. No special format of the query sequence is required. You may include numbering, but make sure to remove all sorts of comments and unrelated information like mail signatures if you don't use END! Examples of input files: ------------------------ This example will ask for the 50 best alignments using an indel cost of 10 and a PAM 200 weight matrix. TITLE RPC1_Lambd this is a test using the Lambda CI repressor from Swiss-Prot. PAM 200 INDEL 10 ALIGN 50 SEQ 1 STKKKPLTQE QLEDARRLKA IYEKKKNELG LSQESVADKM GMGQSGVGAL 51 FNGINALNAY NAALLAKILK VSVEEFSPSI AREIYEMYEA VSMQPSLRSE 101 YEYPVFSHVQ AGMFSPELRT FTKGDAERWV STTKKASDSA FWLEVEGNSM 151 TAPTGSKPSF PDGMLILVDP EQAVEPGDFC IARLGGDEFT FKKLIRDSGQ 201 VFLQPLNPQY PMIPCNESCS VVGKVIASQW PEETFG END Alternatively, the example below will just use the defaults of PAM 120, INDEL 12 and ALIGN 30. SEQ STKKKPLTQE QLEDARRLKA IYEKKKNELG LSQESVADKM GMGQSGVGAL FNGINALNAY NAALLAKILK VSVEEFSPSI AREIYEMYEA VSMQPSLRSE YEYPVFSHVQ AGMFSPELRT FTKGDAERWV STTKKASDSA FWLEVEGNSM TAPTGSKPSF PDGMLILVDP EQAVEPGDFC IARLGGDEFT FKKLIRDSGQ VFLQPLNPQY PMIPCNESCS VVGKVIASQW PEETFG END Current Restrictions -------------------- The maximum number of alignments that you can request is 100. The PAM matrix value must be between 1 and 500 inclusive. The maximum query sequence length is 10000. Only one job per mail request is allowed. The BLITZ output ----------------- After sending your query to EMBL you will receive 2 mail messages from the BLITZ server: 1) a LOG file to indicate the status of your job; 2) the output from the MPsrch program. The LOG file produced by the first example above is: >From: Blitz@EMBL-Heidelberg.DE 5-JAN-1993 17:52:36.92 >To: Joe.Biologist@EMBL-Heidelberg.DE >CC: >Subj: Thanks for your call; here's the log ... > > >TITLE RPC1_Lambd this is a test using the Lambda CI repressor from Swiss-Prot. >PAM 200 >INDEL 10 >ALIGN 50 >SEQ > > 1 STKKKPLTQE QLEDARRLKA IYEKKKNELG LSQESVADKM GMGQSGVGAL > > 51 FNGINALNAY NAALLAKILK VSVEEFSPSI AREIYEMYEA VSMQPSLRSE > > 101 YEYPVFSHVQ AGMFSPELRT FTKGDAERWV STTKKASDSA FWLEVEGNSM > > 151 TAPTGSKPSF PDGMLILVDP EQAVEPGDFC IARLGGDEFT FKKLIRDSGQ > > 201 VFLQPLNPQY PMIPCNESCS VVGKVIASQW PEETFG >END > >* An MPsrch batch job has been submitted to the BLITZ machine. >* The following parameters are used: >* Title: RPC1_Lambd this is a test using the Lambda CI repressor from Swiss-Prot. >* Library to be searched: Swiss-Prot >* Number of alignments: 50 >* PAM: 200 >* INDEL: 10 >* The result file will be mailed to you after completion. The output file (only showing the first 3 and the last results in this case) from the MPsrch program for the first example above is shown below. The results from MPsrch ----------------------- >From: Blitz@EMBL-Heidelberg.DE 5-JAN-1993 17:53:55.56 >To: Joe.Biologist@EMBL-Heidelberg.DE >CC: >Subj: Results for: RPC1_Lambd this is a test using the Lambda CI repressor from Swiss-Prot. > > >Search started: Tue Jan 5 17:51:00 1993 > >MPsrch: Version 1.2 - Shane S. Sturrock & John F. Collins 1992. > Biocomputing Research Unit, University of Edinburgh, UK. > >Title: RPC1_Lambd >Description: this is a test using the Lambda CI repressor from Swiss-Prot. >Sequence: 1 STKKKPLTQEQLEDARRLKA..........ESCSVVGKVIASQWPEETFG 236 > >Parameters: swiss-prot23; PAM 200; Penalty 10; Align 50 > >Predicted No. is the number of results expected by chance to have a score >greater than or equal to the score of the result being printed, and is >derived by analysis of the total score distribution which gave: > > Mean 16.296862; Variance 354.401516; scale 0.045984 The mean and variance of the distribution of scores from the entire database are calculated. These are used later to help provide significance measures for the strength of matches found in the search [4]. > >Result #1 >>RPC1_LAMBD P03034 REPRESSOR PROTEIN CI. The first hit with its Swiss-Prot one line title. In this case, this is the protein we searched with so it is not surprising that we found it first. > > Score: 1370; Indels: 0; Gaps: 0; Predicted No.: 3.354311e-24; > Matches: 236; Mismatches 0; Partials 0; This is perfect hit with 0 gaps or indels and 236 matching residues (exact matches). The predicted number is an estimate of the number of sequences that we would expect to find randomly with a score greater than the observed score of 1370; in this case it is extremely small (not surprisingly). > > ************************************************************ >Db 1 STKKKPLTQEQLEDARRLKAIYEKKKNELGLSQESVADKMGMGQSGVGALFNGINALNAY 60 > Q 1 STKKKPLTQEQLEDARRLKAIYEKKKNELGLSQESVADKMGMGQSGVGALFNGINALNAY 60 > > ************************************************************ >Db 61 NAALLAKILKVSVEEFSPSIAREIYEMYEAVSMQPSLRSEYEYPVFSHVQAGMFSPELRT 120 > Q 61 NAALLAKILKVSVEEFSPSIAREIYEMYEAVSMQPSLRSEYEYPVFSHVQAGMFSPELRT 120 > > ************************************************************ >Db 121 FTKGDAERWVSTTKKASDSAFWLEVEGNSMTAPTGSKPSFPDGMLILVDPEQAVEPGDFC 180 > Q 121 FTKGDAERWVSTTKKASDSAFWLEVEGNSMTAPTGSKPSFPDGMLILVDPEQAVEPGDFC 180 > > ******************************************************** >Db 181 IARLGGDEFTFKKLIRDSGQVFLQPLNPQYPMIPCNESCSVVGKVIASQWPEETFG 236 > Q 181 IARLGGDEFTFKKLIRDSGQVFLQPLNPQYPMIPCNESCSVVGKVIASQWPEETFG 236 Stars are used to mark identical residues. > > >Result #2 >>RPC2_BPP22 P03035 REPRESSOR PROTEIN C2. > > Score: 370; Indels: 8; Gaps: 5; Predicted No.: 4.309500e-04; > Matches: 72; Mismatches 72; Partials 54; > > ... * . * ... ..*... ... . . *. * *.* *. * . . . . >Db 13 RRKKLKIRQAALGKMVGVSNVAISQWERSETEPNGENLLALSKALQCSPDYLLKGDLSQT 72 > Q 25 KKNELGLSQESVADKMGMGQSGVGALFNGINALNAYNAALLAKILKVSVEEFSPSIAREI 84 > > * . .* *. **..* * ** . . * * * ** *. .***. >Db 73 NVAYHS-RHEP--RG--SYPLISWVSAGQWMEAVEPYHKRAIENWHDTTVDCSEDSFWLD 127 > Q 85 YEMYEAVSMQPSLRSEYEYPVFSHVQAGMFSPELRTFTKGDAERWVSTTKKASDSAFWLE 144 > > *.*.*****.* *.*.**.****** * . .*.* *. * *****. *.*. ** >Db 128 VQGDSMTAPAG--LSIPEGMIILVDPEVEPRNGKLVVAKLEGENEATFKKLVMDAGRKFL 185 > Q 145 VEGNSMTAPTGSKPSFPDGMLILVDPEQAVEPGDFCIARLGGD-EFTFKKLIRDSGQVFL 203 > > .********* * .* ..* *. .. >Db 186 KPLNPQYPMIEINGNCKIIGVVVDAK 211 > Q 204 QPLNPQYPMIPCNESCSVVGKVIASQ 229 > The second hit is also against a bacteriophage repressor protein and its score also looks significant as the predicted number of random matches at this level is still very small (0.00043 hits expected randomly at this level). Partial matches (residues that are judged to be "similar" according to the weight matrix used; actually, those with a positive score) are marked with dots. Gaps are indicated by "-" characters; each gap is caused by the insertion or deletion of one or more amino acids from one of the sequences at a site. An indel is one item in a gap (just one "-" character). Seperate counts are given for indels and gaps. > >Result #3 >>UMUD_ECOLI P04153 UMUD PROTEIN. > > Score: 112; Indels: 14; Gaps: 10; Predicted No.: 8.037992e+01; > Matches: 41; Mismatches 50; Partials 33; > > . . ** .*.** ** * *.. . . . . ... . *.** >Db 5 KPADLREIVTFPLFSDLVQCG-FPSPAADYVEQRID-LNQLLIQHPSATYFVKASGDSMI 62 > Q 93 MQPSLRSEYEYPVFSH-VQAGMFSPELRTFTKGDAERWVSTTKKASDSAFWLEVEGNSMT 151 > > * ...** *..**. . **. ** . * *** *** .* . * * *.*. * >Db 63 --DG---GISDGDLLIVDSAITASHGDIVIAAVDG-EFTVKKLQLRPTVQ--LIPMNSAY 114 > Q 152 APTGSKPSFPDGMLILVDPEQAVEPGDFCIARLGGDEFTFKKL-IRDSGQVFLQPLNPQY 210 > > *. *. ... * * ** >Db 115 SPITISSEDTLDVFGVVI 132 > Q 211 -PM-IPCNESCSVVGKVI 226 > > We can skip the next 46 hits and just look at the 50th. > >Result #50 >>TAG1_RAT P22063 AXONAL GLYCOPROTEIN TAG-1 PRECURSOR. > > Score: 86; Indels: 4; Gaps: 4; Predicted No.: 2.809793e+02; > Matches: 16; Mismatches 32; Partials 28; > > *.* *.. ** . . * .. .. .... ..*. *. **. . * >Db 39 PIFEEQPIGLLFPE-ES-A-EDQVTLACRARASPPATYRWKMNGTDMNLEPGSRHQLMGG 95 > Q 104 PVFSHVQAGMFSPELRTFTKGDAERWVSTTKKASDSAFWLEVEGNSMTAPTGSKPSFPDG 163 > > *... * .. ..* . *.* >Db 96 NLVIMSPTKTQDAGVYQCLA 115 > Q 164 MLILVDPEQAVEPGDF-CIA 182 > >Search completed: Tue Jan 5 17:51:44 1993 At this level of match (score = 86) you expect to see about 280 sequences in the database randomly. Therefore we have no evidence for it to be a genuine case of homology. On biological grounds it does not look very likely either (a rat axonal glycoprotein). Retrieving Database Entries --------------------------- You can easily get a copy of matching sequences from the Swiss-Prot database by using the EMBL File Server. You should use the accession number or entry name as given in Swiss-Prot entries and the BLITZ output. Send a mail message to NETSERV@EMBL-Heidelberg.DE containing one command per line. The general syntax is: GET PROT:accnumber or GET PROT:entryname e.g. GET PROT:P22063 or GET PROT:RPC1_LAMBD If you are new to the EMBL File Server, send a mail message to Netserv@EMBL-Heidelberg.DE containing the line HELP to get some introductory information. The File Server offers the latest sequence data, several other databases and free software for molecular biology. The Algorithm ------------- MPsrch uses the well known Smith and Waterman [2] algorithm for searching the database. Your query sequence is compared against all sequences in the database and the best results, as judged by the alignment score, are aligned and included in the output. The algorithm looks for the best "local" match as determined by the amino acid similarity matrix (the PAM value) and the cost of inserting gaps (INDEL cost). Only one match per database sequence is recorded and ranked to give the best results. By "local", we mean that it is possible to detect short matching regions such as binding sites, in the middle of long sequences. The INDEL cost: --------------- The INDEL cost is a penalty that is subtracted from the alignment score for every residue that has been inserted or deleted in the best local alignment. There is a lower limit on the allowed INDEL value because, if you set it too low, the alignment will be filled with many short gaps and will be biologically meaningless. This lower limit will depend on the particulr PAM setting used. The default INDEL cost will usually work best in most cases. Reducing this cost will encourage gaps; increasing it will discourage them. The default and lower limit INDEL costs for a range of PAM values are given below: PAM setting Default INDEL cost INDEL cost lower limit. ------------------------------------------------------------ 1 53 27 20 29 15 40 22 11 60 19 10 80 16 8 100 14 7 DEFAULT value -> 120 13 7 150 11 6 200 9 5 250 7 4 300 6 3 400 5 3 500 4 2 The PAM setting: ---------------- Dayhoff and co-workers [3] produced a series of amino acid weight matrices to help detect distant similarity between proteins. These give weights to the different possible pairs of aligned residues and are known as PAM matrices. The weights can be positive or negative. You are allowed to choose the particular PAM matrix by specifying a value between 1 and 500. Roughly, low PAM values (e.g. 40 or so) will be best suited for finding short regions of very strong similarity while high values (e.g. 250 or more) will be better suited for finding longer, weaker matches. The default value of 120 is a compromise that works well in practice if you only decide to use one setting. Frequently, the list of top scoring sequences will vary greatly depending on the PAM value you use. If you do not find any similarity to a sequence with the default setting, try some other PAM values. Getting further help -------------------- Enquiries about the MPsrch software or the algorithms used should be sent to John Collins or Shane Sturrock (e-mail mpsrch_help@biocomp.ed.ac.uk). Enquiries about the operation of the BLITZ server should be sent to NetHelp@EMBL-Heidelberg.DE Literature ---------- [1] Sturrock, S.S. and Collins, J.F. (1993) MPsrch version 1.3. Biocomputing Research Unit, University of Edinburgh, UK. [2] Smith, T.F. and Waterman, M.S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147:195-197. [3] Dayhoff, M., Schwartz, R.M. and Orcutt, B.C. (1978) (in) Dayhoff, M. (ed.) Atlas of protein sequence and structure, vol. 5, suppl. 3, pp. 345-352. [4] Collins, J.F. and Coulson, A.F.W. (1990) Significance of protein sequence similarities. Pages 474-486 in Doolittle, R.F. (ed.) Methods in Enzymology, volume 183, Academic Press. APPENDIX I The original documentation for MPsrch ------------------------------------------------- This document is specific to running MPsrch interactively on a local machine. The details are different to those used for running MPsrch via the EMBL BLITZ server but the results and algorithm are identical. MPsrch V1.3 User Guide S. S. Sturrock and J. F. Collins Biocomputing Research Unit University of Edinburgh (e-mail sss or jfc@biocomp.ed.ac.uk) 6.1.93 1. Introduction MPsrch is an implementation of the Smith/Waterman Best Local Similarity algorithm for the MasPar family of parallel machines and will run on any MasPar configuration from the minimum 1024 processor MP-1 system up to a 16384 processor MP-2 system. MPsrch can cope with any database size for the foreseeable future. Typical search times for a 4096 processor MP-1 are about 40 seconds for a 377 residue query against Swiss-Prot Version 23 and this corresponds to 84 million cell updates per second although peaks of 130 million plus can be attained with larger query sequences. 2. Running Searches MPsrch may be run in two ways, either interactively or via command line arguments. If any arguments are missing the program will prompt the user for them unless running with the batch flag set in which case defaults will be used if available. There are no defaults for database, query name and output file, and with the -b flag set no output is sent to the display. 1.1 Arguments available -dbase.....Database to be searched -query.....Filename of query to be used (FASTA format) -pams......No. of Dayhoff PAMs in 1-500 range for comparison score table -indel.....Indel penalty, range varies with PAMtable and stringency -output....Filename for output -align.....Number of alignments required in output file -batch.....Suppresses screen output and user interaction The first letter of each of these may be used instead of the whole argument. 1.2 An Example Search The following is the screen display of a typical search using command line options. The query used is an actin. maspar> MPsrch -q actin.pep -d swiss -o foo -p 100 -i 14 -a 100 Welcome to MPsrch (Version 1.3) (Copyright) Shane S. Sturrock & John F. Collins Biocomputing Research Unit, University of Edinburgh, UK Query sequence 377 residues Starting search 0.............................100% -----------------------------> MasPar Time: 40.621008 seconds. Mean 47.168988; Variance 76.572116; scale 0.616007 Aligned: 100 At this point the output file is in the user's directory. The user's query sequence is displayed for reference in the output header: complete if less than 50 residues long; with the first 20 and last 20 residues if longer. maspar> more foo Search started: Wed Jan 6 11:39:01 1993 MPsrch: Version 1.3 - Shane S. Sturrock & John F. Collins 1993. Biocomputing Research Unit, University of Edinburgh, UK. Title: No title supplied Description: No descr supplied Sequence: 1 mcdedettalvcdngsglvk..........witkqeydeagpsivhrkcf 377 Parameters: swiss-prot23; PAM 100; Penalty 14; Align 100 Predicted No. is the number of results expected by chance to have a score greater than or equal to the score of the result being printed, and is derived by analysis of the total score distribution which gave: Statistics: Mean 47.168988; Variance 76.572116; scale 0.616007 RESULT 1 Score 3334; Predicted No. 0.000000e+00; ID ACTS_HUMAN STANDARD; PRT; 377 AA. DE ACTIN, ALPHA SKELETAL MUSCLE. Matches 377; Mismatches 0; Partials 0; Indels 0; Gaps 0; ************************************************************ Db 1 MCDEDETTALVCDNGSGLVKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEA 60 Q 1 mcdedettalvcdngsglvkagfagddapravfpsivgrprhqgvmvgmgqkdsyvgdea 60 ************************************************************ Db 61 QSKRGILTLKYPIEHGIITNWDDMEKIWHHTFYNELRVAPEEHPTLLTEAPLNPKANREK 120 Q 61 qskrgiltlkypiehgiitnwddmekiwhhtfynelrvapeehptllteaplnpkanrek 120 ************************************************************ Db 121 MTQIMFETFNVPAMYVAIQAVLSLYASGRTTGIVLDSGDGVTHNVPIYEGYALPHAIMRL 180 Q 121 mtqimfetfnvpamyvaiqavlslyasgrttgivldsgdgvthnvpiyegyalphaimrl 180 ************************************************************ Db 181 DLAGRDLTDYLMKILTERGYSFVTTAEREIVRDIKEKLCYVALDFENEMATAASSSSLEK 240 Q 181 dlagrdltdylmkiltergysfvttaereivrdikeklcyvaldfenemataasssslek 240 ************************************************************ Db 241 SYELPDGQVITIGNERFRCPETLFQPSFIGMESAGIHETTYNSIMKCDIDIRKDLYANNV 300 Q 241 syelpdgqvitignerfrcpetlfqpsfigmesagihettynsimkcdidirkdlyannv 300 ************************************************************ Db 301 MSGGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWIT 360 Q 301 msggttmypgiadrmqkeitalapstmkikiiapperkysvwiggsilaslstfqqmwit 360 ***************** Db 361 KQEYDEAGPSIVHRKCF 377 Q 361 kqeydeagpsivhrkcf 377 For brevity only the first result is shown. The query sequence is always the lower of the two shown in the alignments. Note that there are seperate values for indels and gaps; a gap is one or more adjacent indels and the implementation used guarantees to minimise the number of gaps while maintaining the correct score for the Smith Waterman algorithm. Alignments displayed are produced by the same algorithm used for the searches. Query sequences can be upper or lower case so it is possible for the user to highlight areas of interest by mixing cases. Query sequences may be up to the present maximum of 10000 residues in length. Search paramemters provided are variable PAM tables and user selectable indel penalties. The user may determine which provide the most sensitive results in any given case. We recommend starting with 100 PAMs and indel penalty of 14 (note the indel is given as a positive number). Predicted number is usually a good guide to likely biological interest in the alignment. However, short motifs or patterns may sometimes be recognised as significant in conveying functional information even though their scores may be low and the predicted number apparently rather high. Here is another example, this time with the largest sequence in the swiss23 database. maspar> MPsrch -q rynr.seq -d swiss -o foo -p 100 -i 14 -a 20 Welcome to MPsrch (Version 1.3) (Copyright) Shane S. Sturrock & John F. Collins Biocomputing Research Unit, University of Edinburgh, UK Query sequence 5037 residues Starting search 0.............................100% -----------------------------> MasPar Time: 343.918292 seconds. Mean 61.006553; Variance 148.332142; scale 0.411283 Aligned: 20 maspar> more foo Search started: Wed Jan 6 11:46:32 1993 MPsrch: Version 1.3 - Shane S. Sturrock & John F. Collins 1993. Biocomputing Research Unit, University of Edinburgh, UK. Title: RYNR_RABIT Description: RYANODINE RECEPTOR, SKELETAL MUSCLE. Sequence: 1 MGDGGEGEDEVQFLRTDDEV..........CWDFFPAGDCFRKQYEDQLS 5037 Parameters: swiss-prot23; PAM 100; Penalty 14; Align 20 Predicted No. is the number of results expected by chance to have a score greater than or equal to the score of the result being printed, and is derived by analysis of the total score distribution which gave: Statistics: Mean 61.006553; Var 148.332142; scale 0.411283 RESULT 1 Score 44624; Predicted No. 0.000000e+00; ID RYNR_RABIT STANDARD; PRT; 5037 AA. DE RYANODINE RECEPTOR, SKELETAL MUSCLE. Matches 5037; Mismatches 0; Partials 0; Indels 0; Gaps 0; ************************************************************ Db 1 MGDGGEGEDEVQFLRTDDEVVLQCSATVLKEQLKLCLAAEGFGNRLCFLEPTSNAQNVPP 60 Q 1 MGDGGEGEDEVQFLRTDDEVVLQCSATVLKEQLKLCLAAEGFGNRLCFLEPTSNAQNVPP 60 ......................[continues] ********************************************************* Db 4981 EEHNLANYMFFLMYLINKDETEHTGQESYVWKMYQERCWDFFPAGDCFRKQYEDQLS 5037 Q 4981 EEHNLANYMFFLMYLINKDETEHTGQESYVWKMYQERCWDFFPAGDCFRKQYEDQLS 5037 ......................[miss some results out] RESULT 4 Score 490; Predicted No. 3.586030e-52; ID PCD6_MOUSE STANDARD; PRT; 500 AA. DE PROTEIN PCD-6 (FRAGMENT). Matches 67; Mismatches 53; Partials 36; Indels 3; Gaps 2; * *.. . *. .. * . . *.*.***.** . *. .* *. **..*. *** * Db 269 ETEQDKEHTCETLLMCIVTVLSHGLRSGGGVGDVLRKPSKEE-PLFAARVIYDLLFFFMV 327 Q 4867 EDEDEPDMKCDDMMTCYLFHMYVGVRAGGGIGDEIEDPAGDEYELY--RVVFDITFFFFV 4924 *.*.* .* *.***.*..** . . * . * *****. * ** ** * **** Db 328 IIIVLNLIFGVIIDTFADLRSEKQKKEEILKTTCFICGLERDKFDNKTVTFEEHIKEEHN 387 Q 4925 IVILLAIIQGLIIDAFGELRDQQEQVKEDMETKCFICGIGSDYFDTTPHGFETHTLEEHN 4984 . .*. *.. . ** ** ** **** * .** * ** Db 388 MWHYLCFIVLVKVKDSTEYTGPESYVAEMIRERNLDWFP 426 Q 4985 LANYMFFLMYLINKDETEHTGQESYVWKMYQERCWDFFP 5023 Experiment with various PAM and penalty settings to see which gives results you find plausible. Beware of PAM settings close to the 500 limit along with very low penalties because this will reduce the sensitivity of the search to regions of good homology. However, try it and see. Sequences which have diverged to the extent that a 500 PAM table could be used, usually contain short strong regions of alignment, which can be more easily distinguished from noise using a less extreme PAM setting. Note that '*' means an identity (match); '.' is a positive substitution (partial); ' ' is a negative substitution (mismatch) or insertion/deletion (indel). If you have any problems using the BLITZ service, or any questions, please send them to: NETHELP@EMBL-Heidelberg.DE Shane S. Sturrock was supported by a grant from the Human Genome Mapping Project. The Biocomputing Research Unit was supported by the Darwin Trust of Edinburgh.
________________________________________