CAMPS (Computational Analysis of the Membrane Protein Space)
is an automatic classification of α-helical membrane protein sequences with at least three transmembrane helices (TMHs)
using both sequence clustering and structure prediction.
The figure below shows an overview of the overall method. First, we generated a dataset of sequences
from completely sequenced genomes with at least three predicted TMHs. Then we initially clustered these sequences using the markov
clustering approach. For each initial cluster, we generated a multiple alignment and extracted consecutive regions where 35% of the
aligned sequences were predicted to be within a TMH (TMH cores).
Using these TMH cores and a method based on high-order hidden Markov models (meta-models), we selected
out of the set of initial membrane protein clusters those clusters whose members are structurally homogeneous. These clusters were
termed SC-clusters (structurally correlated) and form the first clustering layer of the CAMPS database. Each SC-cluster was further
subdivided into two types of subclusters: functionally homogeneous (FH-clusters) and modeling distance clusters (MD-clusters). Members
of a FH-cluster have the same or similar domain architecture (i.e. the sequential order of domains) and are thus likely to have
the same function. Members of a MD-cluster share a sequence identity of at least 30%. Together with the FH-clusters, they form the
second layer of the CAMPS database.
Finally, membrane protein sequences from the SC-, FH- and MD-clusters were mapped to several external databases including other
membrane protein resources (such as TCDB and
GPCRDB) and biomedical databases (such as
Thus, CAMPS is a structural bioinformatics resource for α-helical membrane proteins operating on
three cluster levels: structure (SC-clusters), function (FH-clusters) and modeling distance (MD-clusters)
CAMPS is particularly useful for target selection in structural genomics. This is important since less than 2% of the proteins
of known structure in the Protein Data Bank (PDB)
account for membrane proteins. Thus, SC-clusters that are not yet associated with a known structure might be of interest to structural
Furthermore, our SC-clusters represent an upper bound for the number of existing membrane protein folds. Thus, it is possible to
estimate the number of folds in nature and to explore the structural variety of membrane proteins.
And finally, CAMPS may be further used as a reference for studies on membrane protein evolution, mutations and diseases. The latter
analyses are of particular interest as membrane proteins are of great pharmaceutical interest.
CAMPS currently contains 413,714 membrane protein sequences from 1253 genomes (Archaea: 57, Bacteria: 792, Eukarya: 134, Viruses: 270).
At the first layer, there are 1353 SC-clusters and 53 of them are associated with a known structure. At the second layer (subclusters of
SC-clusters), there are 1319 FH-clusters and 7047 MD-clusters.