SYNOPSIS

prof [\s-1INPUTFILE\s0+] [\s-1OPTIONS\s0]

DESCRIPTION

Secondary structure is predicted by a system of neural networks rating at an expected average accuracy > 72% for the three states helix, strand and loop (Rost & Sander, \s-1PNAS\s0, 1993 , 90, 7558-7562; Rost & Sander, \s-1JMB\s0, 1993 , 232, 584-599; and Rost & Sander, Proteins, 1994 , 19, 55-72; evaluation of accuracy). Evaluated on the same data set, PROFsec is rated at ten percentage points higher three-state accuracy than methods using only single sequence information, and at more than six percentage points higher than, e.g., a method using alignment information based on statistics (Levin, Pascarella, Argos & Garnier, Prot. Engng., 6, 849-54, 1993). PHDsec predictions have three main features:

1. improved accuracy through evolutionary information from multiple sequence alignments
2. improved beta-strand prediction through a balanced training procedure
3. more accurate prediction of secondary structure segments by using a multi-level system

Solvent accessibility is predicted by a neural network method rating at a correlation coefficient (correlation between experimentally observed and predicted relative solvent accessibility) of 0.54 cross-validated on a set of 238 globular proteins (Rost & Sander, Proteins, 1994, 20, 216-226; evaluation of accuracy). The output of the neural network codes for 10 states of relative accessibility. Expressed in units of the difference between prediction by homology modelling (best method) and prediction at random (worst method), PROFacc is some 26 percentage points superior to a comparable neural network using three output states (buried, intermediate, exposed) and using no information from multiple alignments.

Transmembrane helices in integral membrane proteins are predicted by a system of neural networks. The shortcoming of the network system is that often too long helices are predicted. These are cut by an empirical filter. The final prediction (Rost et al., Protein Science, 1995, 4, 521-533; evaluation of accuracy) has an expected per-residue accuracy of about 95%. The number of false positives, i.e., transmembrane helices predicted in globular proteins, is about 2%. The neural network prediction of transmembrane helices (PHDhtm) is refined by a dynamic programming-like algorithm. This method resulted in correct predictions of all transmembrane helices for 89% of the 131 proteins used in a cross-validation test; more than 98% of the transmembrane helices were correctly predicted. The output of this method is used to predict topology, i.e., the orientation of the N-term with respect to the membrane. The expected accuracy of the topology prediction is > 86%. Prediction accuracy is higher than average for eukaryotic proteins and lower than average for prokaryotes. PHDtopology is more accurate than all other methods tested on identical data sets.

If no output file option (such as --fileRdb or --fileOut) is given the \s-1RDB\s0 formatted output is written into ./INPUTFILENAME.prof where 'prof' replaces the extension of the input file. In lack of extension '.prof' is appended to the input file name.

Output format

The \s-1RDB\s0 format is self-annotating, see example outputs in /usr/share/profphd/prof/exa.

REFERENCES

Rost, B. and Sander, C. (1994a). Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, 19(1), 55-72.
Rost, B. and Sander, C. (1994b). Conservation and prediction of solvent accessibility in protein families. Proteins, 20(3), 216-26.
Rost, B., Casadio, R., Fariselli, P., and Sander, C. (1995). Transmembrane helices predicted at 95% accuracy. Protein Sci, 4(3), 521-33.

OPTIONS

See each keyword for more help. Most of these are likely to be broken.

a

alternative connectivity patterns (default=3)

3

predict sec + acc + htm

acc

predict solvent accessibility, only

ali

add alignment to 'human-readable' \s-1PROF\s0 output file(s)

arch

system architecture (e.g.: SGI64|SGI5|SGI32|SUNMP|ALPHA)

ascii

write 'human-readable' \s-1PROF\s0 output file(s)

best

\s-1PROF\s0 with best accuracy and longest run-time

both

predict secondary structure and solvent accessibility

data

data=<all|brief|normal|detail> for \s-1HTML\s0 out: only those parts of predictions written

debug

keep most intermediate files, print debugging messages

dirWork

work directory, default: a temporary directory from File::Temp::tempdir. Must be fully qualified path. Known to work.

doEval

\s-1DO\s0 evaluation for list (only for known structures and lists)

doFilterHssp

filter the input \s-1HSSP\s0 file (excluding some pairs)

doHtmfil

\s-1DO\s0 filter the membrane prediction (default)

doHtmisit

\s-1DO\s0 check strength of predicted membrane helix (default)

doHtmref

\s-1DO\s0 refine the membrane prediction (default)

doHtmtop

\s-1DO\s0 membrane helix topology (default)

dssp

convert \s-1PROF\s0 into \s-1DSSP\s0 format

expand

expand insertions when converting output to \s-1MSF\s0 format

fast

\s-1PROF\s0 with lowest accuracy and highest speed

fileCasp

name of \s-1PROF\s0 output in \s-1CASP\s0 format (file.caspProf)

fileDssp

name of \s-1PROF\s0 output in \s-1DSSP\s0 format (file.dsspProf)

fileHtml

name of \s-1PROF\s0 output in \s-1HTML\s0 format (file.htmlProf)

fileMsf

name of \s-1PROF\s0 output in \s-1MSF\s0 format (file.msfProf)

fileNotHtm

name of file flagging that no membrane helix was found

fileOut

name of \s-1PROF\s0 output in \s-1RDB\s0 format (file.rdbProf) Known to work.

fileProf

name of \s-1PROF\s0 output in human readable format (file.prof) Broken.

fileRdb

name of \s-1PROF\s0 output in \s-1RDB\s0 format (file.rdbProf) Known to work.

fileSaf

name of \s-1PROF\s0 output in \s-1SAF\s0 format (file.safProf)

filter

filter the input \s-1HSSP\s0 file (excluding some pairs)

good

\s-1PROF\s0 with good accuracy and moderate speed

graph

add \s-1ASCII\s0 graph to 'human-readable' \s-1PROF\s0 output file(s)

htm

use: 'htm=<N|0.N>' gives minimal transmembrane helix detected default is 'htm=8' (resp. htm=0.8) smaller numbers more false positives and fewer false negatives!

html argument

'hmtl' or 'html=<all|body|head>' write \s-1HTML\s0 format of prediction 'html' will result in that the \s-1PROF\s0 output is converted to \s-1HTML\s0 'html=body' restricts \s-1HTML\s0 file to the \s-1HTML_BODY\s0 tag part 'html=head' restricts \s-1HTML\s0 file to the \s-1HTML_HEADER\s0 tag part 'html=all' gives both \s-1HEADER\s0 and \s-1BODY\s0

keepConv

keep the conversion of the input file to \s-1HSSP\s0 format

keepFilter argument

<*|doKeepFilter=1> keep the filtered \s-1HSSP\s0 file

keepHssp argument

<*|doKeepHssp=1> keep the intermediate \s-1HSSP\s0 file

keepNetDb argument

<*|doKeepNetDb=1> keep the intermediate DbNet file(s)

list argument

<*|isList=1> input file is list of files

msf

convert \s-1PROF\s0 into \s-1MSF\s0 format

nice

give 'nice-D' to set the nice value (priority) of the job

noProfHead

do \s-1NOT\s0 copy file with tables into local directory

noSearch

short for doSearchFile=0, i.e. no searching of \s-1DB\s0 files

noascii

surpress writing \s-1ASCII\s0 (i.e. human readable) result files

nohtml

surpress writing \s-1HTML\s0 result files

nonice

job will not be niced, i.e. not run with lower priority

notEval

\s-1DO\s0 \s-1NOT\s0 check accuracy even when known structures

notHtmfil

do \s-1NOT\s0 filter the membrane prediction

notHtmisit

do \s-1NOT\s0 check whether or not membrane helix strong enough

notHtmref

do \s-1NOT\s0 refine the membrane prediction

notHtmtop

do \s-1NOT\s0 membrane helix topology

nresPerLineAli

Number of characters used for \s-1MSF\s0 file. Default: 50.

numresMin

Minimal number of residues to run network, otherwise prd=symbolPrdShort. Default: 9.

optJury

Adds \s-1PHD\s0 to jury. Default: `normal,usePHD'. Many other parameters change the default for this one as a side-effect, the list is not comprehensive: phd, nophd, /^para(3|Both|Sec|Acc|Htm|CapH|CapE|CapHE)/, /^para?/, jct

para3

Parameter file for sec+acc+htm. Default: `<\s-1DIRPROF\s0>/net/PROFboth_best.par'.

paraAcc

Parameter file for acc. Default: `<\s-1DIRPROF\s0>/net/PROFacc_best.par'.

paraBoth

Parameter file for sec+acc. Default: `<\s-1DIRPROF\s0>/net/PROFboth_best.par'.

paraSec

Parameter file for sec. Default: `<\s-1DIRPROF\s0>/net/PROFsec_best.par'.

riSubAcc

Minimal reliability index (\s-1RI\s0) for subset PROFacc. Default: 4.

riSubSec

Minimal reliability index (\s-1RI\s0) for subset PROFsec. Default: 5.

riSubSym

Symbol for residues predicted with \s-1RI\s0 < riSubSec/Acc. Default: `.'.

s_k_i_p

problems, manual, hints, notation, txt, known, \s-1DONE\s0, Date, date, aa, Lhssp, numaa, code

saf

convert \s-1PROF\s0 into \s-1SAF\s0 format

scrAddHelp
scrGoal

neural network switching

scrHelpTxt

Input file formats accepted: hssp,dssp,msf,saf,fastamul,pirmul,fasta,pir,gcg,swiss

scrIn

list_of_files (or single file) parameter_file

scrName

prof

scrNarg

2

sec

predict secondary structure, only

silent

no information written to screen - this is the default

skipMissing

do not abort if input file missing!

sourceFile

prof

test

is just a test (faster)

translate-jobid-in-param-values

String 'jobid' gets substituted with $par{jobid}

tst

quick run through program, low accuracy

user

user name

--version

Print version

AUTHOR

B. Rost, Sander C, Fariselli P, Casadio R, Liu J, Yachdav G, Kajan L.

EXAMPLES

Prediction from alignment in \s-1HSSP\s0 file for best results
 prof /usr/share/profphd/prof/exa/1ppt.hssp fileRdb=/tmp/1ppt.hssp.prof
Prediction from a single sequence

prof /usr/share/profphd/prof/exa/1ppt.f fileRdb=/tmp/1ppt.f.rdbProf

phd.pl invocation

/usr/share/profphd/prof/embl/phd.pl /usr/share/profphd/prof/exa/1ppt.hssp htm fileOutPhd=/tmp/query.phdPred fileOutRdb=/tmp/query.phdRdb fileNotHtm=/tmp/query.phdNotHtm

ENVIRONMENT

\s-1PROFPHDDIR\s0

Override package prof package dir /usr/share/profphd.

\s-1RGUTILSDIR\s0

Override location of librg-utils-perl /usr/share/librg-utils-perl.

FILES

*.rdbProf

default output file extension

/usr/share/profphd/prof

default data directory

BUGS

Please report bugs at <https://rostlab.org/bugzilla3/enter_bug.cgi?product=profphd>.

Prediction from \s-1HSSP\s0 file fails when residue lines with exclamation marks `!' are present:

Use 'optJury=normal' and 'both' like this: prof /tmp/1a3q.hssp fileRdb=/tmp/1a3q.hssp.profRdb optJury=normal both

RELATED TO prof…

Main website

<http://www.predictprotein.org/>

Documentation

<http://www.predictprotein.org/docs.php>

Community website

<http://groups.google.com/group/PredictProtein>

\s-1FTP\s0

<ftp://rostlab.org/pub/cubic/downloads/prof>

Newsgroups

<http://groups.google.com/group/PredictProtein>