Assessment of an epitope cross-reactivity in human tissues


Epitope sequence: Example (?):

Fixed/variable sequence positions (?)
Compute cross-reactivity profile and index (?)

Number of mismatches (?)


HLA alleles (?)


The cleavage threshold (?)


The TAP weight (?)


The combined score threshold (?)



Help


Overview of server function and output results

The web-server performs assessment of your T cell therapy target (epitope or peptide antigen) for potential cross-reactivity (CR) in human tissues, using gene expression and proteomics databases.

After submitting an epitope sequence and selecting its associated allele type (details in Input section below), on the results page you will obtain the results in the form of a table for each database separately. The table, for example for 'PaxDB', will contain the protein abundances of the matching natural epitopes (rows) in the human tissues (columns).

For the visualization - in the form of a radar (or bar) plot - we use the sums of the expression or abundance values in each tissue column. At the bottom of the results page there will be given a table with CR-indices for all of the databases.

Input

Epitope Sequence and Number of Mismatches

Two main input parameters are the epitope amino-acid sequence and a number of mismatches. In the given forms, please provide a string of amino acids in one letter code and a number of allowed mismatches as an integer value. With the default setting of zero mismatches, only exact matches to the given epitope are returned.

Please note that the amino acid sequence has to have a length of at least seven positions in order to avoid excessive database matches. The number of mismatches cannot exceed a half of the size of the provided epitope for the same reason. Furthermore, MHC binding scores can only be provided for epitopes with a length between 8 and 14 positions, due to the implementation of the underlying software.

When you want to define fixed positions in the epitope input sequence, check the corresponding box and provide the sequence in a case sensitive manner. E.g., KvaeLvhfL defines a nonamer where the first, fifth and last position can not be modified.



Proteasomal Cleavage Prediction

Cleavage probability is predicted using program NetChop 3.1 , with the prediction method set to "C-term 3.0" for predicting the boundaries of cytotoxic T cell (CTL) epitopes. It uses a neural network trained on a database containing 1,260 publicly available MHC class I ligands. When calculating the cleavage probability for the current epitope, we used the method described in the original paper by Keşmir et al. (2002) with the following formula


wherein Pc is the probability that the peptide is cleaved exactly at the C-terminus and Pcon represents the probability of the rest of the peptide staying intact:

where Oi represents the output of the network for position i of the peptide. The default value of 0.7 is used for the parameter t, as suggested by the authors, however users can provide their own threshold value as a floating point number between 0 and 1 in the provided input form.
Due to the overall cleavage probability being a product, it becomes very small for longer input sequences. Hence it is advisable to only rely on this score for peptides in the range of seven to eleven amino acids, as that is the epitope size for which NetChop has been most extensively tested, although the calculation itself does not limit the input to a certain length.



TAP Affinity Prediction

Peters et al. (2003) have established a 9 x 20 matrix, mati,j, that contains for each amino acid at every possible epitope position (of length nine) a log(IC50) value which can be summed up to obtain an IC50 for the complete peptide.
When testing the divergence between predicted and experimentally tested IC50 values, the authors concluded that the best concordance is achieved when taking precursor peptides into account, i.e. instead of the initial nonamer they calculated the affinity for an N-terminal elongated sequence. As the length of the epitopes submitted to our server is defined by the users, we modified the established formula to work without precursor sequences:


where Ni denotes the i-th amino acid from the N-terminus of the given peptide and α is a weight that determines the influence of the N-terminal residues on the overall affinity score. Hence, only the IC50 values for the C-terminal residue as well as a weighted sum of the three N-terminal amino acids are used for the scoring. The authors experimentally determined the best values for weight α to be 0.2, but it can be changed in the Expitope web server by the user into another floating point number.
Although it is technically possible to score peptides of any length ≥ 4 with this approach, it has to be kept in mind that the matrices are constructed on the basis of nonamer epitopes and have also only been extensively tested one those or with potential precursors. When analyzing longer peptides the returned values might not reflect the real affinity to TAP and it could be beneficial to exclude the N terminus in such cases.



MHC Binding Affinity

For the affinity prediction of the epitopes to the major histocompatibility complex (MHC-I) for a large range of HLA-alleles, we integrated the portable version of NetMHC 4.0. The tool uses an artificial neural networks trained on different MHC alleles and returns the affinity of a given peptide to the specified alleles in nM IC50 values. Due to the size limitations implemented in NetMHC, only peptides of a length between 8 and 14 amino acids can be used for affinity prediction. The authors explicitly state that predictions for peptides longer than eleven positions have not been extensively validated and caution should be taken for octamer predictions, as some alleles might not bind them to any significant extent. Expitope users can submit a selection of multiple HLA types for affinity prediction, between one and all; the default allele is HLA-0201.



Combined Score

To sort all found matches to the user's query with regard to their real potential to function as an epitope, we apply a scoring function as proposed by Keşmir et al. (2002). It combines the probability that a given peptide is cleaved from its original sequence, transported to the endoplasmic reticulum and bound by MHC class I proteins. The resulting score Q is defined as:


where P is the proteasomal cleavage probability and the A-terms are affinities in IC50 values to the transporter associated with antigen processing (TAP) and the MHC complex as explained above.



Amino acid sequence (single char coding, lower case specifies flexible position, upper case - fixed). The sequence form is followed by the check button, which is selected if there are flexible/fixed positions (default - all flexible).

Selection of the combined score threshold (Q), the suggested value of 0.02 gives top 9% of the HLA binders (to ensure binding with 86% certainty as shown statistically). Default value of 1e-4 gives top 50% of binders.



iCrossR - calculation of cross-reactivity index and profile

The button 'Compute Cross-Reactivity' can be checked to perform calculation as described in the iCrossR publication (Jaravine, V., et al. Assessment of cancer and virus antigens for cross-reactivity in human tissues. Bioinformatics (Oxford, England) 2017;33(1):104-111.).

After submitting an epitope sequence and selecting its associated allele type, on the results page you will obtain the summary of the results in the form of a table (called CR-profile), containing the sums of abundances of the matching natural epitopes in the tissues. The profile is a 4x22 table, which is used for the visualization map (PNG image). the CR-index is calculated from the profile as described in the paper using formulae (see Methods section of the publication).



Example

The example can be invoked by pressing ‘EVDPASNTY’ Button. It sets the sequence to EVDPASNTY, sets mismatch to 3, selects allele HLA-A0101. Threshold and weight can be left as default values. Press Button “Submit” for job submission. The calculation usually takes about 2 mins, depending on number of matches of the sequences - between the given epitope and all human proteins. This epitope is one of the CT antigenes, which are usually expressed in Testis/Ovary/Female gonad tissues.



Output

After the jobs is calculated the resulting page shows the table and its visualization for the first database. The viewing of the results can be changed by the database selection list and by plot type selection list.

At the bottom of the page there is link to download file with raw output of the program (in text format); and the link to the same data in Excel SLK format.



Databases: descriptions, titles, references

Table. Sources of gene expression and protein abundance data.

#

Data source

ID

Name

Number of tissues

Type

References

1

PaxDB

Pax4

PaxDB v4.0

22

protein abundance

(Wang, et al., 2015)

2

Expression atlas

E-Prot-3

Human Protein Atlas

44

protein abundance

(Petryszak, et al., 2016), (Uhlen, et al., 2015)

3

Expression atlas

E-Prot-1

Human Proteome Map

23

protein abundance

(Petryszak, et al., 2016), (Kim, et al., 2014)

4

Expression atlas

E-Mtab-513

Illumina Body Map

16

gene expression

(Petryszak, et al., 2016), (Barbosa-Morais, et al., 2012)

5

Expression atlas

E-Mtab-5214

GTEx

53

gene expression

(Petryszak, et al., 2016), (Lonsdale, et al., 2013)

6

Wang et al, 2008

Wang

Wang 2008

7

gene expression

(Wang, et al., 2008)

7

Expression atlas

E-Mtab-3358

FANTOM5 RIKEN

56

gene expression

(Petryszak, et al., 2016), (The FANTOM5 project, 2014)



References

For the databases:

Wang, M., et al. Version 4.0 of PaxDb: Protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics 2015;15(18):3163-3168.

Petryszak, R., et al. Expression Atlas update--an integrated database of gene and protein expression in humans, animals and plants. Nucleic acids research 2016;44(D1):D746-752.

Uhlen, M., et al. Proteomics. Tissue-based map of the human proteome. Science (New York, N.Y.) 2015;347(6220):1260419.

Kim, M.-S., et al. A draft map of the human proteome. Nature 2014;509(7502):575-581.

Barbosa-Morais, N.L., et al. The evolutionary landscape of alternative splicing in vertebrate species. Science (New York, N.Y.) 2012;338(6114):1587-1593.

Lonsdale, J., et al. The Genotype-Tissue Expression (GTEx) project. Nat Genet 2013;45(6):580-585.

Wang, E.T., et al. Alternative isoform regulation in human tissue transcriptomes. Nature 2008;456(7221):470-476.

The Fantom Consortium (Riken). A promoter-level mammalian expression atlas. Nature 2014;507(7493):462-470.



Other references:

Boegel, S., et al. HLA typing from RNA-Seq sequence reads. Genome Medicine 2012;4(12):102.

Haase, K., et al. Expitope: a web server for epitope expression. Bioinformatics (Oxford, England) 2015;31(11):1854-1856.

Keşmir, C., et al. Prediction of proteasome cleavage motifs by neural networks. Protein engineering 2002;15(4):287-296.

Khanna, R. Adoptive immunotherapy: methods and protocols. Immunol Cell Biol 2005;83(3):313-313.

Kohm, A.P., Fuller, K.G. and Miller, S.D. Mimicking the way to autoimmunity: an evolving theory of sequence and structural homology. Trends in Microbiology 2003;11(3):101-105.

Kumar, A. and Delogu, F. Dynamical footprint of cross-reactivity in a human autoimmune T-cell receptor. Nature: Scientific Reports 2017(7):42496.

Larsen, M.V., et al. An integrative approach to CTL epitope prediction: a combined algorithm integrating MHC class I binding, TAP transport efficiency, and proteasomal cleavage predictions. European journal of immunology 2005;35(8):2295-2303.

Ludewig, B. and Hoffmann, M.W. Adoptive immunotherapy : methods and protocols. Totowa, N.J.: Humana Press; 2005.

Maus, M.V., et al. Adoptive immunotherapy for cancer or viruses. Annu Rev Immunol 2014;32:189-225.

Moise, L., et al. T cell epitope redundancy: cross-conservation of the TCR face between pathogens and self and its implications for vaccines and autoimmunity. Expert Review of Vaccines 2016;15(5):607-617.

Nalepa, G., Rolfe, M. and Harper, J.W. Drug discovery in the ubiquitin-proteasome system. Nat Rev Drug Discov 2006;5(7):596-613.

Pruitt, K.D., et al. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic acids research 2012;40(Database issue):D130-135.

Vigneron, N. Human Tumor Antigens and Cancer Immunotherapy. BioMed Research International 2015;2015:17.

Vigneron, N., et al. Database of T cell-defined human tumor antigens: the 2013 update. Cancer immunity 2013;13:15.

Vincent, J.L., et al. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med 1996;22(7):707-710.

Vita, R., et al. The immune epitope database (IEDB) 3.0. Nucleic acids research 2015;43(D1):D405-D412.

Wang, J. and Maldonado, M.A. The ubiquitin-proteasome system and its role in inflammatory and autoimmune diseases. Cellular & Molecular Immunology 2006;3(4):255-262.



Details of text files

A link to text file at the bottom of the page contains output of the program for download. The tables in the file list for each of the databases, all transcripts of the matching proteins which contain the given peptide or variants thereof (called ‘natural epitopes’ (NPs)).

It is sorted by ascending number of mismatches and descending combined score.

The first two columns contain the RefSeq protein identifier to which the matching peptide belongs and the part of the protein sequence that matched the input sequence:



‘RefSeq ID’ (ID from RefSeq: NCBI Reference Sequence Database)

‘Epitope’ (aa sequence)



The following columns contain the following information:

‘Sequence Position’ gives the starting position of the epitope in the full length RefSeq entry.

'Number of mismatches’ - number of mismatches in the variable positions of the epitope (k).

‘Cleavage score’ - score of proteasomal cleavage,

‘TAP score’ - score of binding to the TAP transporter,

‘MHC score’ - the score of its affinity to MHC class I alleles; it is given after HLA type and “:”, e.g. 5 in “HLA-A02:01:5”.

‘Combined score’ - the probability that the epitope will be created and shown on receptors by cells, it is calculated as described in the associated publication as Q = Pcl/(Atap*Amhc).

‘String ID’ - STRING ID, composed of '9606.' and ENSP (Ensembl Protein ID),

‘Transcript ID’ - ENST (Ensembl transcript ID),

‘Gene ID’ - ENSG (Ensembl gene ID),

‘Gene Name’ - gene name.



After that follow the columns for the tissues, which are different for each database. The values are in ppm (ppm - parts per million, this means that the values are normalized, so that the sums of all entries in the database for a tissue is equal to 1 million).



For example, the PaxDB table has the following columns and first line of the example data:

refseqID epitope index mismatch cleavageScore tapScore mhcScore combinedScore stringID transcriptID geneID geneName Brain Heart Lung Liver Kidney Prostate gland Pancreas Gall bladder Colon Esophagus Rectum Uterus Female gonad Testis Placenta Skin Plasma Platelet Saliva Urine Whole organism Cell line

NP_004979.3 KVLEYVIKV 277 0 4.454E-3 2.355E-1 HLA-A02:01:5, 3.783E-3 9606.ENSP00000349085 ENST00000356661 ENSG00000198681 MAGEA1 0.000 0.000 0.000 7.580 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2.020 1.400 0.2000 0.000 0.000 0.000 0.000 0.000 2.610 10.90



Additionally, Expitope lists proteins, which contain the provided epitope but could not be matched to a transcript identifier. These are usually automatically determined proteins (recognizable through their "XP_" identifier start instead of "NP_") whose real existence has not yet been confirmed.




References


iCrossR / Expitope2.0

Assessment of cancer and virus antigens for cross-reactivity in human tissues.
Jaravine V., Raffegerst S., Schendel D.J., Frishman D. Bioinformatics. 2017 Jan 1;33(1):104-111. PMID:27614350 DOI


Abundance data

Version 4.0 of PaxDb: Protein abundance data, integrated across model organisms, tissues, and cell-lines. Wang M, Herrmann CJ, Simonovic M, Szklarczyk D, von Mering C. Proteomics. 15(18):3163-8, 2015. PMID:25656970

Expression Atlas

The data used for this service are imported from for 'Human sapiens' group in the baseline experiments section. Expression Atlas

NetChop

The role of the proteasome in generating cytotoxic T cell epitopes: Insights obtained from improved predictions of proteasomal cleavage.
M. Nielsen, C. Lundegaard, O. Lund, and C. Keşmir. Immunogenetics, 57(1-2):33-41, 2005. PMID:15744535

Prediction of proteasome cleavage motifs by neural networks.
C. Keşmir, A. Nussbaum, H. Schild, V. Detours, and S. Brunak. Prot. Eng., 15(4): 287-296, 2002. PMID:11983929

TAP affinity

Identifying MHC Class I Epitopes by Predicting the TAP Transport Efficiency of Epitope Precursors.
B. Peters, S. Bulik, R. Tampe, P. M. van Endert, H-G. Holzhütter. J Immunol, 171:1741-1749, 2003. PMID:12902473

NetMHC

Reliable prediction of T-cell epitopes using neural networks with novel sequence representations.
Nielsen M, Lundegaard C, Worning P, Lauemøller SL, Lamberth K, Buus S, Brunak S, Lund O. Protein Sci., 12:1007-17, 2003. PMID:12717023

NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11
Lundegaard C, Lamberth K, Harndahl M, Buus S, Lund O, Nielsen M. Nucleic Acids Res. 1;36(Web Server issue):W509-12. 2008. PMID:18463140

Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers.
Lundegaard C, Lund O, Nielsen M. Bioinformatics, 24(11):1397-98, 2008. PMID:18413329

Expression data

An integrated encyclopedia of DNA elements in the human genome.
The ENCODE Project Consortium. Nature, 489(7414):57-74, 2012. PMID:22955616

Alternative isoform regulation in human tissue transcriptomes.
Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB. Nature 456:470-476, 2008. PMID:18978772

Authors



Expitope2.0 was designed and created by members of the Frishman group Department of Genome-Oriented Bioinformatics of the Technical University Munich in collaboration with Medigene Immunotherapies GmbH, a subsidiary of Medigene AG.

This web site is created by Dr. Victor Jaravine. If you encounter any problems or have comments regarding Expitope2.0, please contact v.jaravine(a)medigene.com.

Disclaimer

No warranties or guaranties concerning this web based tool "Expitope2.0", express or implied, including but not limited to a warranty of merchantability or fitness for a particular purpose is given. This web based tool "Expitope2.0" is solely intended to be an initial source of information in TCR selection based on certain assumptions that may or may not align with your specific conditions, without the warranty or guaranty of generated data to be complete, reliable and externally verified. This web based tool "Expitope2.0" does not provide any warranty or guaranty regarding toxicology data and information in animals or humans and does not provide any data and information regarding protein sequences.

No warranties or guaranties are given that the generated data and information will meet your requirements or operate under your specific conditions of use. In addition, no warranties or guaranties are given that use of this web based tool "Expitope2.0" will be secure, error free, or free from interruption. The user shall determine upon its sole responsibility whether generated data and information sufficiently meets any needs and requirements. The user is solely responsible and liable for any loss incurred due to failure of this web based tool "Expitope2.0" to meet the user’s requirements. No liability is given to the user or any other third party for indirect, consequential, special or punitive damages resulting from the use of the web based tool "Expitope2.0".