CAMPS

Content:


Browse CAMPS

The browse interface allows you to explore all data stored in the CAMPS database. You can start browsing the website either at the genome level or at the cluster level.

a) Browse genomes

At the genome level, you can access the list of prokaryotic, eukaryotic and viral genomes covered in the CAMPS database. For each genome, the following information is provided:

  • Genome: name of genome (with link to NCBI)
  • Superkingdom: superkingdom of genome
  • SC-cluster/FH-cluster/MD-cluster: number of SC-/FH-/MD-clusters that contain membrane proteins from this genome (with link to respective SC-/FH-/MD-clusters)

[Top]

b) Browse clusters

At the cluster level, you can choose between the different types of clusters: SC-clusters (whose members are likely to share the same fold), FH-clusters (whose members are likely to share the same function), and MD-clusters (whose members share a sequence identity of at least 30%). The SC-clusters form the first layer of the CAMPS database. FH-clusters and MD-clusters are subclusters of SC-clusters and form the second layer.

[Top]

c) Cluster report

Each cluster in CAMPS is presented in a cluster report that looks like this:

Screenshot of CAMPS cluster report

A cluster report contains the following information:

  • Overview

    The 'Overview' section introduces the main information at a glance. This includes a short description of the cluster, the number of members, the average sequence length, the TMH range, the number of TMH cores, the structural homogeneity, the notion whether structures are available for this cluster, the taxonomic distribution, and the number of subclusters (FH-clusters and MD-clusters).

  • Taxonomy

    The 'Taxonomy' section contains the list of genomes that can be found in the cluster and the number of proteins for each genomic origin in the cluster (see column #Instances).

  • Alignment

    The 'alignment' section provides precalculated alignments for the cluster that can be viewed interactively using Jalview. The alignment also contains annotated predicted transmembrane helices and transmembrane helix cores.

    Screenshot of CAMPS alignment
  • FH-clusters/MD-clusters

    If the cluster is a SC-cluster, the 'FH-clusters' and 'MD-clusters' sections list the two types of subclusters. Each subcluster contains a link to his own cluster report.

  • Members

    All membrane proteins that belong to this cluster are recorded in the 'Members' section. Each member contains a link to a protein report containg comprehensive information (such as membrane protein topology, structure information, links to other databases; see below).

  • Structures

    If the cluster is associated with known membrane protein structures, a list of matching PDB proteins can be found in the 'Structures' section.

  • Links

    The 'Links' section contains comprehensive links to other databases, such as GO (Gene Ontology), Pfam, TCDB (Transporter Classification Database), and SCOP. The number and percentage of cluster members that are associated with the respective reference is given in column 'Members (%)'.

[Top]

d) Protein report

Each membrane protein in CAMPS is outlined in a protein report that looks like this:

Screenshot of CAMPS alignment

A protein report contains the following information:

  • Proteins

    The 'Proteins' section contains a short description and the genomic origin of the protein and, if available, all proteins that share the same amino acid sequence (100% sequence similarity!).

  • Clusters

    In the 'Clusters' section you can find all cluster affiliations (SC-cluster, FH-cluster, MD-cluster).

  • Topology

    The 'Topology' section includes the coordinates of the predicted transmembrane helices and the orientation of the protein. The topology is visualized using the TEXtopo program.

  • Sequence

    The 'Sequence' section encloses the amino acid sequence of the protein.

  • Structure

    If three dimensional structures are available for the protein, they are recorded in the 'Structure' section (with the respective link to PDB). The PDB proteins listed here are perfect matches to the protein (100% sequence similarity). Matches with lesser sequence similarity can be found in the 'Links' section.

  • Links

    The 'Links' section contains comprehensive mapping to other databases, such as GenBank, Pfam, GO (Gene Ontology), eggNOG, GPCRDB, PDBTM, DrugBank, OMIM, and TargetDB.

[Top]


Search CAMPS

The CAMPS database can be searched at two levels: protein level and cluster level. At the protein level, the search returns a list of CAMPS proteins matching the search query. At the cluster level, the list contains matching CAMPS clusters.
Using the drop-down menu provided in the search interface, you can further specify your search query.

Screenshot of CAMPS search interface

a) Search by ID and keyword (protein level)

Five search options are available in the 'ID and keyword' section:

  • GenBank accession

    Short description: GenBank is a comprehensive sequence database of all publicly available DNA sequences and their protein translations.

    If you choose the 'GenBank accession' option, you have to specify a valid GenBank accession (e.g. AAA30674). The CAMPS database will then be searched for proteins that are mapped to this accession.

  • Organism

    Proteins from a specific organism can be retrieved by choosing the option 'Organism' and entering the name of the organism of interest (e.g. Bos taurus).

  • Text search

    If the 'Text search' option is used, proteins containing the specified query term (e.g. rhodopsin) within their description will be returned.

  • UniProtKB accession number

    Short description: UniProt is a comprehensive resource for protein sequence and annotation data.

  • UniProtKB entry name

    You can also search for UniProt proteins within the CAMPS database by either specifying the UniProtKB (UniProt Knowledgebase) accession number (e.g. P02699) or the entry name (e.g. OPSD_HUMAN).

[Top]

b) Search by annotation (protein level)

Five search options are available in the 'Annotation' section:

  • EC number

    Short description: The Enzyme commission (EC) number is a numerical classification system for enzymes and is based on the chemical reactions they catalyze.

    Using this option, you can search for proteins that have the specified EC (Enzyme commission) number. Thereby, the EC number can be defined at any level (in total there are 4 levels!). For example, if the term '1.1' is entered, all proteins having an EC annotation starting with 1.1. are returned. This includes proteins with annotations 1.1.1.62, 1.1.5.2, 1.1.99.22 etc. Similarly, the term '1.3.1' yields proteins with annotations 1.3.1.21, 1.3.1.38, 1.3.1.70 etc.

  • eggNOG

    Short description: eggNOG is a database of orthologous groups of genes.

    This search option allows to search for proteins that have the specified eggNOG orthologous group assignment (e.g. meNOG07820).

  • GO

    Short description: The Gene Ontology (GO) project is a major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases.

    Proteins can also be searched by specifying GO terms (e.g. GO:0016056).

  • Pfam

    Short description: Pfam is a large collection of protein domains and families represented by multiple alignments.

    Proteins that are supposed to contain a specific Pfam domain (e.g. PF06146) can be extracted using this search option.

  • SUPERFAMILY

    Short description: The SUPERFAMILY database contains a collection of hidden Markov models representing structural protein domains at the SCOP superfamily level.

    You can also search for proteins that are annotated with a given SUPERFAMILY domain (e.g. SSF46458).

[Top]

c) Search by cross-reference (protein level)

Proteins from the CAMPS database were mapped to sequences from external databases using the BLAST search engine. The user can search for these mappings by specifying one of the following cross-references.

  • CATH

    Short description: CATH is a manually curated classification of protein structures.

    You can search for proteins assigned to a specified CATH class (e.g. 1), CATH architecture (e.g. 1.20), CATH topology (e.g. 1.20.1070) or CATH superfamily (e.g. 1.20.1070.10).

    [Info: The mapping to CATH database was induced using PDBTM proteins.]

  • DrugBank accession

    Short description: The DrugBank database contains detailed drug and drug target information.

    If you are interested in drugs listed in the DrugBank database, you can use this option by specifying a DrugBank accession number (e.g. DB01728).

  • OMIM id

    Short description: The OMIM database is a comprehensive collection of human genes and genetic disorders.

    If you want to search for CAMPS proteins associated with diseases, you can use this option by specifying an OMIM id either describing a gene or a phenotype (e.g. 219700).

    [Info: The mapping to OMIM database was induced using UniProt (SwissProt) proteins.]

  • OPM

    Short description: The OPM database contains information about the orientation of membrane protein structures with respect to the membrane bilayer.

    Using this option, you can search for proteins assigned to a specific OPM superfamily (e.g. 1.1.01) or family (e.g. 1.1.01.02).

    [Info: The mapping to OPM database was induced using PDBTM proteins.]

  • PDBTM id

    Short description: The PDBTM database is a collection of known membrane protein structures.

    Membrane proteins with known structures can be found by specifying a PDBTM id (e.g. 1u19).

  • SCOP

    Short description: SCOP is a manual classification of protein structures.

    You can search for proteins assigned to a specified SCOP class (e.g. f), SCOP fold (e.g. f.13), SCOP superfamily (e.g. f.13.1) or SCOP superfamily (e.g. f.13.1.1).

    [Info: The mapping to SCOP database was induced using PDBTM proteins.]

  • TargetDB

    Short description: TargetDB is a protein target registration database containing information about the experimental progress and the status of protein targets.

    Membrane protein targets that are included in the TargetDB database can be found in CAMPS by specifying a TaretDB project target id (e.g. BSGCAIR31296).

  • TCDB

    Short description: The TCDB database is a classification system for membrane protein transport systems.

    You can search for proteins assigned to a specified TCDB transporter class (e.g. 3), TCDB transporter subclass (e.g. 3.E), TCDB transporter family (e.g. 3.E.1), TCDB transporter subfamily (e.g. 3.E.1.1) or specific transporter (e.g. 3.E.1.1.1).

  • VFDB internal id

    Short description: VFDB is a database of virulence factors for bacterial pathogens.

    If you are interested if a protein contained in the VFDB is also included in CAMPS, you can use this option by specifying a VFDB interal id (e.g. VFG0042).

[Top]

d) Search by sequence (protein level)

In order to search the CAMPS database, you can also provide an amino acid sequence given either in FASTA format or raw format (no header).

Example for FASTA format:

>A0NGX9_ANOGA
MAAFVEPHFDAWTQGSGNMSVVDKVPPEMLHMVHPHWNQFPPMNPLWHSILGFAIFMLGM
ISMTGNGCVMYIFTNTKSLRTPSNLLVVNLAFSDFFMMFTMGPPMVINCWHETWTFGPFA
CELYAMLGSLFGCASIWTMTMIAFDRYNVIVKGLAGKPMTNNGALLRILGVWVFALFWTL
APLFGWNRYVPEGNMTACGTDYLTQTWLSRSYIIIYAIFVYWLPLLTIIYSYTFILKAVS
AHEKNMREQAKKMNVASLRTQEAQNTSTEMKLAKVALVTISLWFMAWTPYLVINFTGIFK
AAPISPLATIWGSLFAKANAVYNPIVYGISHPKYRAALYQKFPSLSCQDAPVDDGQSVAS
GATQASDEKA

Example for raw format:

MAAFVEPHFDAWTQGSGNMSVVDKVPPEMLHMVHPHWNQFPPMNPLWHSILGFAIFMLGM
ISMTGNGCVMYIFTNTKSLRTPSNLLVVNLAFSDFFMMFTMGPPMVINCWHETWTFGPFA
CELYAMLGSLFGCASIWTMTMIAFDRYNVIVKGLAGKPMTNNGALLRILGVWVFALFWTL
APLFGWNRYVPEGNMTACGTDYLTQTWLSRSYIIIYAIFVYWLPLLTIIYSYTFILKAVS
AHEKNMREQAKKMNVASLRTQEAQNTSTEMKLAKVALVTISLWFMAWTPYLVINFTGIFK
AAPISPLATIWGSLFAKANAVYNPIVYGISHPKYRAALYQKFPSLSCQDAPVDDGQSVAS
GATQASDEKA

Using this option, the database is searched for identical matches (i.e. sequences that are 100% identical to the specified sequence) and similar matches (using the BLAST search engine). For the BLAST search, you can further specify the E-value and the maximal number of hits to be returned.

[Top]

e) Search by ID and keyword (cluster level)

Two options are available in the 'ID and keyword' section:

  • Organism

    Clusters (either SC-, FH-, or MD-clusters) containing proteins from a specific organism can be retrieved by choosing the option 'Organism' and entering the name of the organism of interest (e.g. Bos taurus).

  • Text search

    If the 'Text search' option is used, clusters (either SC-, FH-, or MD-clusters) with proteins containing the specified query term (e.g. rhodopsin) within their description will be returned.

[Top]

f) Search by cross-reference (cluster level)

Similar to search by cross-reference at protein level. But instead of searching for proteins that are linked with the specified query term, clusters containing these proteins are returned.

[Top]


Classify your protein

In order to get a classification for an unknown query sequence, you can submit your sequence of interest using the classification interface of CAMPS. The submission form looks like this:

Screenshot of CAMPS classification interface

You have to provide a single amino acid sequence (either in FASTA format or raw format; see above). You can also type in an optional description of your query sequence. Per default, the results of the classification will be sent by email (please make sure that your email address is correct). It is also possible to display the classification results in the browser, but we recommend to use the email notification, because the job takes about 5 minutes depending on the sequence length. Two search parameters allow you to constrain the classification results. The '#Hits' parameter specifies the maximal number of hits that will be returned. By setting the 'P-value threshold', only those hits with a P-value better than this threshold will be shown. Please note that sequences longer than 5000 residues will not be processed.

After the sequence is submitted, it is searched against all meta-models (whereas each meta-model represents one SC-cluster). Using an internal scoring mechanism the best fitting meta-models are returned. The email containing the results of the classification looks like this:

Screenshot of CAMPS classification result

The results section contains the name of the matching SC-clusters, their descriptions and three different scores:

Score:
The query sequence is compared against each meta-model (representing a SC-cluster) using the forward algorithm that calculates the probability that the meta-model generated the sequence. The score is then calculated as the natural logarithm of that probability.

P-value:
The P-value probability is defined as the fraction of non-members with a score at least as good as the one obtained for the match. Thereby, the term 'non-members' describes all those sequences that do not belong to the respective SC-cluster.

Z-score:
The Z-score is defined as the difference of the score obtained for the match and the mean score of all non-members divided by the standard deviation of the scores of all non-members. Again, the term 'non-members' describes all those sequences that do not belong to the respective SC-cluster.

The SC-clusters are sorted by Z-score with the best matching SC-cluster listed first. By following the given link you are directed to the CAMPS website that provides further information to the SC-cluster.
If you have chosen the notification via browser, the results are presented to you in a similar fashion:

Screenshot of CAMPS classification result

[Top]

Technische Universität München - Department of Genome Oriented Bioinformatics