CAMPS

CAMPS (Computational Analysis of the Membrane Protein Space) is an automatic classification of α-helical membrane protein sequences with at least three transmembrane helices (TMHs) using both sequence clustering and structure prediction.

The figure below shows an overview of the overall method. First, we generated a dataset of sequences from completely sequenced genomes with at least three predicted TMHs. Then we initially clustered these sequences using the markov clustering approach. For each initial cluster, we generated a multiple alignment and extracted consecutive regions where 35% of the aligned sequences were predicted to be within a TMH (TMH cores).
Using these TMH cores and a method based on high-order hidden Markov models (meta-models), we selected out of the set of initial membrane protein clusters those clusters whose members are structurally homogeneous. These clusters were termed SC-clusters (structurally correlated) and form the first clustering layer of the CAMPS database. Each SC-cluster was further subdivided into two types of subclusters: functionally homogeneous (FH-clusters) and modeling distance clusters (MD-clusters). Members of a FH-cluster have the same or similar domain architecture (i.e. the sequential order of domains) and are thus likely to have the same function. Members of a MD-cluster share a sequence identity of at least 30%. Together with the FH-clusters, they form the second layer of the CAMPS database.
Finally, membrane protein sequences from the SC-, FH- and MD-clusters were mapped to several external databases including other membrane protein resources (such as TCDB and GPCRDB) and biomedical databases (such as OMIM).

Overview of CAMPS pipeline

Thus, CAMPS is a structural bioinformatics resource for α-helical membrane proteins operating on three cluster levels: structure (SC-clusters), function (FH-clusters) and modeling distance (MD-clusters)

CAMPS is particularly useful for target selection in structural genomics. This is important since less than 2% of the proteins of known structure in the Protein Data Bank (PDB) account for membrane proteins. Thus, SC-clusters that are not yet associated with a known structure might be of interest to structural genomics researches.
Furthermore, our SC-clusters represent an upper bound for the number of existing membrane protein folds. Thus, it is possible to estimate the number of folds in nature and to explore the structural variety of membrane proteins.
And finally, CAMPS may be further used as a reference for studies on membrane protein evolution, mutations and diseases. The latter analyses are of particular interest as membrane proteins are of great pharmaceutical interest.

CAMPS currently contains 413,714 membrane protein sequences from 1253 genomes (Archaea: 57, Bacteria: 792, Eukarya: 134, Viruses: 270). At the first layer, there are 1353 SC-clusters and 53 of them are associated with a known structure. At the second layer (subclusters of SC-clusters), there are 1319 FH-clusters and 7047 MD-clusters.

Technische Universität München - Department of Genome Oriented Bioinformatics