General
HelixCorr combines the outcome of several prediction algorithms for correlated mutations into a consensus prediction for helix-helix contacts. In its basic (and recommended) version it includes the algorithms McBASC (using the Miyata and the McLachlan matrix), OMES-KASS, OMES-FODOR and ELSC.
Optionally, results of the two programs CorrMut and CAPS can additionally be included in the consensus prediction. However, predictions with these two programs can be quite time-consuming in case of large multiple alignments, require user input regarding several paramter settings and in our study often resulted in no or only a limited number of significant correlations. Accordingly, users, who want to include one or both of these methods anyway, need to obtain predictions with these programs independantly before they use HelixCorr.
Two more prediction algorithms (namely MI and SCA), which were also evaluated in our study on co-evolving residues in membrane proteins, are not included in HelixCorr at all since prediction accuracies for helix-helix-contacts were found to be inadequate.
Installation
HelixCorr consists of perl scripts and ready-to-use java classes for the prediction algorithms McBASC, OMES-KASS, OMES-FODOR, ELSC. No extra installation is required if the consensus prediction should be based on these algorithms only.
Please note: if predictions obtained with either the program CAPS or the program CorrMut should be included within the consensus prediction, these programs need to be downloaded and executed separately as noted above.
Required Input
Two input files and a paramter file are required for the default use of HelixCorr.
Multiple alignment file in Fasta Format
The multiple alignment should include full length sequences (including
the soluble parts of the proteins). The provided implementation of HelixCorr
will extract the transmembrane domains based on the provided TMS file
(see below). The reference sequence (for which the positions of the
transmembrane segment are provided) needs to be the first sequence in the
multiple alignment. For an example see the file
example.fasta
TMS file for the reference sequence
A tab-delimited file summarizing the begin and end of each transmembrane
segment in the reference sequence (first sequence in the multiple
alignment file). Note that the given positions should be based on the
protein sequence without any gaps. For an example see the file
example.tms
Parameter File params.ctl
Input parameters need to be specified using the parameter file
params.ctl. Required parameters are the following:
- Input Alignment: Full path to multiple alignment file
- TMS File: Full path to file specifying TMS positions
- Result Folder: Full path to folder, where results should be stored
- Required Correlations: Threshold for required correlations per pair of transmembrane helices, any pair of predicted correlated positions lying on a helix pair with less correlated mutations than the threshold will be removed from the consensus prediction
- Use CAPS: Flag (Y/N), if set to Y, CAPS prediction will be included in the consensus prediction
- CAPS Output File: Full path to CAPS output file (example)
- Use CorrMut: Flag (Y/N), if set to Y, CorrMut prediction will be included
- CorrMut Output Folder: Full path to the folder, where CorrMut result files are stored (this folder needs to contain the bootstrap output file called "BSShortNR2.NJ.out" (example) and the MSA file used for CorrMut called "rst.seq" (example)
Running the program
Two different variations of HelixCorr are provided :
- ExtractConsensus.pl: calculates consensus prediction including results from the algorithms McBASC_Miyata, McBASC_McLachlan, OMES_KASS, OMES_Fodor, ELSC and optionally CorrMut and/or CAPS
- ExtractConsensusR.pl: calculates reduced consensus prediction including results from McBASC_Miyata, McBASC_McLachlan, OMES_KASS and optionally CAPS (the four best performing individual algorithms in our study)
To run a consensus prediction with any of the two programs, simply start the program by changing to the HelixCorr directory and typing
perl ExtractConsensus.pl
or
perl ExtractConsensusR.pl
Output
Within the specified output folder you will find several output files.
- An additional alignment file (tms.msa), which includes only the transmembrane parts of the proteins as specified by the positions in the input TMS file.
- Result files for the individual prediction methods (mcbasc_mclachlan.out,omes_fodor.out, ...) including correlation scores and significance (if provided by the method) for all pairs of positions. Please note, that the number of pairs may change between methods due to different ways of treating positions with gaps. Also note, that positions correspond to the full length sequence of the reference protein but include only residues within transmembrane segments.
- The consensus output file (for example consensus_3.out or consensusR_3.out), which is a tab-delimited file containing the best L/5 correlations of each method (L being the length of the transmembrane parts of the protein). Correlations on helix pairs with less correlations than the given threshold have been removed. (example).