Assessment of an epitope cross-reactivity in human tissues
Help
Overview of server function and output results
The web-server performs assessment of your T cell therapy target (epitope or peptide antigen) for potential cross-reactivity (CR) in human tissues, using gene expression and proteomics databases.
After submitting an epitope sequence and selecting its associated allele type (details in Input section below), on the results page you will obtain the results in the form of a table for each database separately. The table, for example for 'PaxDB', will contain the protein abundances of the matching natural epitopes (rows) in the human tissues (columns).
For the visualization - in the form of a radar (or bar) plot - we use the sums of the expression or abundance values in each tissue column. At the bottom of the results page there will be given a table with CR-indices for all of the databases.
Input
Epitope Sequence and Number of Mismatches
Two main input parameters are the epitope amino-acid sequence and a number of mismatches. In the given forms, please provide a string of amino acids in one letter code and a number of allowed mismatches as an integer value. With the default setting of zero mismatches, only exact matches to the given epitope are returned.
Please note that the amino acid sequence has to have a length of at least seven positions in order to avoid excessive database matches. The number of mismatches cannot exceed a half of the size of the provided epitope for the same reason. Furthermore, MHC binding scores can only be provided for epitopes with a length between 8 and 14 positions, due to the implementation of the underlying software.
When you want to define fixed positions in the epitope input sequence, check the corresponding box and provide the sequence in a case sensitive manner. E.g., KvaeLvhfL defines a nonamer where the first, fifth and last position can not be modified.
Proteasomal Cleavage Prediction
Cleavage probability is predicted using program NetChop 3.1 , with the prediction method set to "C-term 3.0" for predicting the boundaries of cytotoxic T cell (CTL) epitopes. It uses a neural network trained on a database containing 1,260 publicly available MHC class I ligands. When calculating the cleavage probability for the current epitope, we used the method described in the original paper by Keşmir et al. (2002) with the following formula
wherein Pc is the probability that
the peptide is cleaved exactly at the C-terminus and Pcon
represents the probability of the rest of the peptide staying
intact:
where Oi represents the output of the
network for position i of the peptide. The default value of
0.7 is used for the parameter t, as suggested by the authors,
however users can provide their own threshold value as a floating
point number between 0 and 1 in the provided input form.
Due to
the overall cleavage probability being a product, it becomes very
small for longer input sequences. Hence it is advisable to only rely
on this score for peptides in the range of seven to eleven amino
acids, as that is the epitope size for which NetChop has been most
extensively tested, although the calculation itself does not limit
the input to a certain length.
TAP Affinity Prediction
Peters
et al. (2003) have established a
9 x 20 matrix, mati,j, that contains for each amino acid
at every possible epitope position (of length nine) a log(IC50)
value which can be summed up to obtain an IC50 for the
complete peptide.
When testing the divergence between predicted
and experimentally tested IC50 values, the authors
concluded that the best concordance is achieved when taking
precursor peptides into account, i.e. instead of the initial nonamer
they calculated the affinity for an N-terminal elongated sequence.
As the length of the epitopes submitted to our server is defined by
the users, we modified the established formula to work without
precursor sequences:
where Ni denotes the i-th amino acid
from the N-terminus of the given peptide and α is a weight that
determines the influence of the N-terminal residues on the overall
affinity score. Hence, only the IC50 values for the
C-terminal residue as well as a weighted sum of the three N-terminal
amino acids are used for the scoring. The authors experimentally
determined the best values for weight α to be 0.2, but it can be
changed in the Expitope web server by the user into another floating
point number.
Although it is technically possible to score
peptides of any length ≥ 4 with this approach, it has to be kept
in mind that the matrices are constructed on the basis of nonamer
epitopes and have also only been extensively tested one those or
with potential precursors. When analyzing longer peptides the
returned values might not reflect the real affinity to TAP and it
could be beneficial to exclude the N terminus in such cases.
MHC Binding Affinity
For the affinity prediction of the epitopes to the major histocompatibility complex (MHC-I) for a large range of HLA-alleles, we integrated the portable version of NetMHC 4.0. The tool uses an artificial neural networks trained on different MHC alleles and returns the affinity of a given peptide to the specified alleles in nM IC50 values. Due to the size limitations implemented in NetMHC, only peptides of a length between 8 and 14 amino acids can be used for affinity prediction. The authors explicitly state that predictions for peptides longer than eleven positions have not been extensively validated and caution should be taken for octamer predictions, as some alleles might not bind them to any significant extent. Expitope users can submit a selection of multiple HLA types for affinity prediction, between one and all; the default allele is HLA-0201.
Combined Score
To sort all found matches to the user's query with regard to their real potential to function as an epitope, we apply a scoring function as proposed by Keşmir et al. (2002). It combines the probability that a given peptide is cleaved from its original sequence, transported to the endoplasmic reticulum and bound by MHC class I proteins. The resulting score Q is defined as:
where P is the proteasomal cleavage probability and the
A-terms are affinities in IC50 values to the transporter
associated with antigen processing (TAP) and the MHC complex as
explained above.
Amino acid sequence (single char coding, lower case specifies flexible position, upper case - fixed). The sequence form is followed by the check button, which is selected if there are flexible/fixed positions (default - all flexible).
Selection of the combined score threshold (Q), the suggested value of 0.02 gives top 9% of the HLA binders (to ensure binding with 86% certainty as shown statistically). Default value of 1e-4 gives top 50% of binders.
iCrossR - calculation of cross-reactivity index and profile
The button 'Compute Cross-Reactivity' can be checked to perform calculation as described in the iCrossR publication (Jaravine, V., et al. Assessment of cancer and virus antigens for cross-reactivity in human tissues. Bioinformatics (Oxford, England) 2017;33(1):104-111.).
After submitting an epitope sequence and selecting its associated allele type, on the results page you will obtain the summary of the results in the form of a table (called CR-profile), containing the sums of abundances of the matching natural epitopes in the tissues. The profile is a 4x22 table, which is used for the visualization map (PNG image). the CR-index is calculated from the profile as described in the paper using formulae (see Methods section of the publication).
Example
The example can be invoked by pressing ‘EVDPASNTY’ Button. It sets the sequence to EVDPASNTY, sets mismatch to 3, selects allele HLA-A0101. Threshold and weight can be left as default values. Press Button “Submit” for job submission. The calculation usually takes about 2 mins, depending on number of matches of the sequences - between the given epitope and all human proteins. This epitope is one of the CT antigenes, which are usually expressed in Testis/Ovary/Female gonad tissues.
Output
After the jobs is calculated the resulting page shows the table and its visualization for the first database. The viewing of the results can be changed by the database selection list and by plot type selection list.
At the bottom of the page there is link to download file with raw output of the program (in text format); and the link to the same data in Excel SLK format.
Databases: descriptions, titles, references
Table. Sources of gene expression and protein abundance data.
# |
Data source |
ID |
Name |
Number of tissues |
Type |
References |
PaxDB |
Pax4 |
PaxDB v4.0 |
22 |
protein abundance |
(Wang, et al., 2015) |
|
Expression atlas |
E-Prot-3 |
Human Protein Atlas |
44 |
protein abundance |
(Petryszak, et al., 2016), (Uhlen, et al., 2015) |
|
Expression atlas |
E-Prot-1 |
Human Proteome Map |
23 |
protein abundance |
(Petryszak, et al., 2016), (Kim, et al., 2014) |
|
Expression atlas |
E-Mtab-513 |
Illumina Body Map |
16 |
gene expression |
(Petryszak, et al., 2016), (Barbosa-Morais, et al., 2012) |
|
Expression atlas |
E-Mtab-5214 |
GTEx |
53 |
gene expression |
(Petryszak, et al., 2016), (Lonsdale, et al., 2013) |
|
Wang et al, 2008 |
Wang |
Wang 2008 |
7 |
gene expression |
(Wang, et al., 2008) |
|
Expression atlas |
E-Mtab-3358 |
FANTOM5 RIKEN |
56 |
gene expression |
(Petryszak, et al., 2016), (The FANTOM5 project, 2014) |
References
For the databases:
Wang, M., et al. Version 4.0 of PaxDb: Protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics 2015;15(18):3163-3168.
Petryszak, R., et al. Expression Atlas update--an integrated database of gene and protein expression in humans, animals and plants. Nucleic acids research 2016;44(D1):D746-752.
Uhlen, M., et al. Proteomics. Tissue-based map of the human proteome. Science (New York, N.Y.) 2015;347(6220):1260419.
Kim, M.-S., et al. A draft map of the human proteome. Nature 2014;509(7502):575-581.
Barbosa-Morais, N.L., et al. The evolutionary landscape of alternative splicing in vertebrate species. Science (New York, N.Y.) 2012;338(6114):1587-1593.
Lonsdale, J., et al. The Genotype-Tissue Expression (GTEx) project. Nat Genet 2013;45(6):580-585.
Wang, E.T., et al. Alternative isoform regulation in human tissue transcriptomes. Nature 2008;456(7221):470-476.
The Fantom Consortium (Riken). A promoter-level mammalian expression atlas. Nature 2014;507(7493):462-470.
Other references:
Boegel, S., et al. HLA typing from RNA-Seq sequence reads. Genome Medicine 2012;4(12):102.
Haase, K., et al. Expitope: a web server for epitope expression. Bioinformatics (Oxford, England) 2015;31(11):1854-1856.
Keşmir, C., et al. Prediction of proteasome cleavage motifs by neural networks. Protein engineering 2002;15(4):287-296.
Khanna, R. Adoptive immunotherapy: methods and protocols. Immunol Cell Biol 2005;83(3):313-313.
Kohm, A.P., Fuller, K.G. and Miller, S.D. Mimicking the way to autoimmunity: an evolving theory of sequence and structural homology. Trends in Microbiology 2003;11(3):101-105.
Kumar, A. and Delogu, F. Dynamical footprint of cross-reactivity in a human autoimmune T-cell receptor. Nature: Scientific Reports 2017(7):42496.
Larsen, M.V., et al. An integrative approach to CTL epitope prediction: a combined algorithm integrating MHC class I binding, TAP transport efficiency, and proteasomal cleavage predictions. European journal of immunology 2005;35(8):2295-2303.
Ludewig, B. and Hoffmann, M.W. Adoptive immunotherapy : methods and protocols. Totowa, N.J.: Humana Press; 2005.
Maus, M.V., et al. Adoptive immunotherapy for cancer or viruses. Annu Rev Immunol 2014;32:189-225.
Moise, L., et al. T cell epitope redundancy: cross-conservation of the TCR face between pathogens and self and its implications for vaccines and autoimmunity. Expert Review of Vaccines 2016;15(5):607-617.
Nalepa, G., Rolfe, M. and Harper, J.W. Drug discovery in the ubiquitin-proteasome system. Nat Rev Drug Discov 2006;5(7):596-613.
Pruitt, K.D., et al. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic acids research 2012;40(Database issue):D130-135.
Vigneron, N. Human Tumor Antigens and Cancer Immunotherapy. BioMed Research International 2015;2015:17.
Vigneron, N., et al. Database of T cell-defined human tumor antigens: the 2013 update. Cancer immunity 2013;13:15.
Vincent, J.L., et al. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med 1996;22(7):707-710.
Vita, R., et al. The immune epitope database (IEDB) 3.0. Nucleic acids research 2015;43(D1):D405-D412.
Wang, J. and Maldonado, M.A. The ubiquitin-proteasome system and its role in inflammatory and autoimmune diseases. Cellular & Molecular Immunology 2006;3(4):255-262.
Details of text files
A link to text file at the bottom of the page contains output of the program for download. The tables in the file list for each of the databases, all transcripts of the matching proteins which contain the given peptide or variants thereof (called ‘natural epitopes’ (NPs)).
It is sorted by ascending number of mismatches and descending combined score.
The first two columns contain the RefSeq protein identifier to which the matching peptide belongs and the part of the protein sequence that matched the input sequence:
‘RefSeq ID’ (ID from RefSeq: NCBI Reference Sequence Database)
‘Epitope’ (aa sequence)
The following columns contain the following information:
‘Sequence Position’ gives the starting position of the epitope in the full length RefSeq entry.
'Number of mismatches’ - number of mismatches in the variable positions of the epitope (k).
‘Cleavage score’ - score of proteasomal cleavage,
‘TAP score’ - score of binding to the TAP transporter,
‘MHC score’ - the score of its affinity to MHC class I alleles; it is given after HLA type and “:”, e.g. 5 in “HLA-A02:01:5”.
‘Combined score’ - the probability that the epitope will be created and shown on receptors by cells, it is calculated as described in the associated publication as Q = Pcl/(Atap*Amhc).
‘String ID’ - STRING ID, composed of '9606.' and ENSP (Ensembl Protein ID),
‘Transcript ID’ - ENST (Ensembl transcript ID),
‘Gene ID’ - ENSG (Ensembl gene ID),
‘Gene Name’ - gene name.
After that follow the columns for the tissues, which are different for each database. The values are in ppm (ppm - parts per million, this means that the values are normalized, so that the sums of all entries in the database for a tissue is equal to 1 million).
For example, the PaxDB table has the following columns and first line of the example data:
refseqID epitope index mismatch cleavageScore tapScore mhcScore combinedScore stringID transcriptID geneID geneName Brain Heart Lung Liver Kidney Prostate gland Pancreas Gall bladder Colon Esophagus Rectum Uterus Female gonad Testis Placenta Skin Plasma Platelet Saliva Urine Whole organism Cell line
NP_004979.3 KVLEYVIKV 277 0 4.454E-3 2.355E-1 HLA-A02:01:5, 3.783E-3 9606.ENSP00000349085 ENST00000356661 ENSG00000198681 MAGEA1 0.000 0.000 0.000 7.580 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2.020 1.400 0.2000 0.000 0.000 0.000 0.000 0.000 2.610 10.90
Additionally, Expitope lists proteins, which contain the provided epitope but could not be matched to a transcript identifier. These are usually automatically determined proteins (recognizable through their "XP_" identifier start instead of "NP_") whose real existence has not yet been confirmed.
References
iCrossR / Expitope2.0
Assessment of cancer and virus antigens for cross-reactivity in human tissues. Jaravine V., Raffegerst S., Schendel D.J., Frishman D. Bioinformatics. 2017 Jan 1;33(1):104-111. PMID:27614350 DOIAbundance data
Version 4.0 of PaxDb: Protein abundance data, integrated across model organisms, tissues, and cell-lines. Wang M, Herrmann CJ, Simonovic M, Szklarczyk D, von Mering C. Proteomics. 15(18):3163-8, 2015. PMID:25656970Expression Atlas
The data used for this service are imported from for 'Human sapiens' group in the baseline experiments section. Expression AtlasNetChop
The role of the proteasome in generating cytotoxic T cell epitopes: Insights obtained from improved predictions of proteasomal cleavage.M. Nielsen, C. Lundegaard, O. Lund, and C. Keşmir. Immunogenetics, 57(1-2):33-41, 2005. PMID:15744535 Prediction of proteasome cleavage motifs by neural networks.
C. Keşmir, A. Nussbaum, H. Schild, V. Detours, and S. Brunak. Prot. Eng., 15(4): 287-296, 2002. PMID:11983929
TAP affinity
Identifying MHC Class I Epitopes by Predicting the TAP Transport Efficiency of Epitope Precursors. B. Peters, S. Bulik, R. Tampe, P. M. van Endert, H-G. Holzhütter. J Immunol, 171:1741-1749, 2003. PMID:12902473NetMHC
Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Nielsen M, Lundegaard C, Worning P, Lauemøller SL, Lamberth K, Buus S, Brunak S, Lund O. Protein Sci., 12:1007-17, 2003. PMID:12717023 NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11 Lundegaard C, Lamberth K, Harndahl M, Buus S, Lund O, Nielsen M. Nucleic Acids Res. 1;36(Web Server issue):W509-12. 2008. PMID:18463140 Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers. Lundegaard C, Lund O, Nielsen M. Bioinformatics, 24(11):1397-98, 2008. PMID:18413329Expression data
An integrated encyclopedia of DNA elements in the human genome. The ENCODE Project Consortium. Nature, 489(7414):57-74, 2012. PMID:22955616Alternative isoform regulation in human tissue transcriptomes.
Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB. Nature 456:470-476, 2008. PMID:18978772
Authors
Expitope2.0 was designed and created by members of the Frishman group Department of Genome-Oriented Bioinformatics of the Technical University Munich in collaboration with Medigene Immunotherapies GmbH, a subsidiary of Medigene AG.
This web site is created by Dr. Victor Jaravine. If you encounter any problems or have comments regarding Expitope2.0, please contact v.jaravine(a)medigene.com.
Disclaimer
No warranties or guaranties concerning this web based tool "Expitope2.0", express or implied, including but not limited to a warranty of merchantability or fitness for a particular purpose is given. This web based tool "Expitope2.0" is solely intended to be an initial source of information in TCR selection based on certain assumptions that may or may not align with your specific conditions, without the warranty or guaranty of generated data to be complete, reliable and externally verified. This web based tool "Expitope2.0" does not provide any warranty or guaranty regarding toxicology data and information in animals or humans and does not provide any data and information regarding protein sequences.
No warranties or guaranties are given that the generated data and information will meet your requirements or operate under your specific conditions of use. In addition, no warranties or guaranties are given that use of this web based tool "Expitope2.0" will be secure, error free, or free from interruption. The user shall determine upon its sole responsibility whether generated data and information sufficiently meets any needs and requirements. The user is solely responsible and liable for any loss incurred due to failure of this web based tool "Expitope2.0" to meet the user’s requirements. No liability is given to the user or any other third party for indirect, consequential, special or punitive damages resulting from the use of the web based tool "Expitope2.0".