MHC-I processing predictions

This website provides access to predictions of antigen processing through the MHC class I antigen presentation pathway. The goal of the prediction is to identify MHC-I ligands, i.e. peptides that are naturally processed from their source proteins and presented by MHC class I molecules. The screenshot below illustrates the steps necessary to make a prediction. Each of the steps is described in more detail below.

First specify the sequences you want to scan for MHC-I ligands. The sequences should either be entered directly into the textarea field labeled "Enter protein sequence(s), or can be taken from a file that has to be uploaded using the button labeled "Browse". Please enter no more then 200 FASTA sequences or upload file size less than or equal to 10 MB per query.
The sequences can be supplied in three different formats:

Space separated sequences
One continuous sequence
FASTA format

The format of the sequences can be specified explicitly using the list box labeled "Choose sequence format". If that list box is set to "auto detect format", the input will be interpreted as FASTA if an opening ">" character is found, or as a continuous sequence otherwise.
All sequences have to be amino acids specified in single letter code (ACDEFGHIKLMNPQRSTVWY)

The prediction method list box allows choosing from a number of MHC class I binding prediction methods: Artificial neural network (ANN), Average relative binding (ARB), Stabilized matrix method (SMM), SMM with a Peptide:MHC Binding Energy Covariance matrix (SMMPMBEC), Scoring Matrices derived from Combinatorial Peptide Libraries (Comblib_Sidney2008) and NetMHCpan. Since 10/11/11 the IEDB team has changed the choice of recommended prediction method for the processing tool to be NetMHCpan rather than a consensus. This is due to the processing tools requiring a quantitative IC50 value, which the consensus approach as implemented does not supply. In addition NetMHCpan is available for all MHC alleles and has been shown to perform very well in recent comparisons. We will continuously re-evaluate the choice for IEDB recommended mechanism.

The MHC binding predictions used here are available in a standalone version, and are described in more detail here. If you are not interested in predictions for a specific MHC allele, you still have to use these list boxes to determine the length of the MHC-I ligands of interest.

Predictions are not limited to peptides of one specific length binding to one specific allele, but multiple allele/length pairs can be submitted at a time. The allele / peptide length combination can be selected using the list boxes in this section, and can be add to a list by clicking the "Add" button. For some allele / peptide length combinations, no prediction tools exist because there is too little experimental data available to generate them. For instance, selecting an MHC source species of human will allow you to select a distinct set of MHC alleles and Peptide lengths related to the human MHC source species. Alternately, selecting a MHC source species of mouse will allow you to select a different set of MHC alleles and Peptide lengths related to the mouse MHC source species.

Selections in the listboxes in this section influence the values available in others. For example, selecting "mouse" as the MHC source species will limit the selections available in the MHC allele listbox. Similarly, the allele chosen will limit the available peptide lengths.

There are two types of proteasomes, the constitutively expressed 'house-keeping' type, and immuno proteasomes that are induced by IFN-γ secretion. The latter are thought to increase the efficiency of antigen presentation. If you are unsure, select the immuno proteasome type to make a prediction. The predictions are based on in vitro proteasomal digests of the enolase and casein proteins as described here.

Species Warning

Please note that both the proteasome and TAP predictions were developed using experimental data for human versions of the molecule. At least for TAP molecules, there are known to be some species dependent differences in specificity. Therefore, using these predictions for eptitope processing in non-human cells should only be done with extra caution in interpreting results.

The TAP score estimates an effective -log(IC50) values for the binding to TAP of a peptide or its N-terminal prolonged precursors. It has been show that high affinity of a peptide translates into high transport rates. Note that the original reference used +ln(IC50) values (ln = natural logarithm instead of log = base 10). The calculation of the score remains unchanged.

The menus in this section change how the prediction output is displayed. Using the "Sort peptides by" listbox, the results can be presorted by the order of the peptides in their source sequence (default) or by their predicted scores. Use the listbox to specify which score to sort by.

To limit the number of results displayed, which can significantly speed up the time it takes to make a prediction, it is possible to define a lower boundary for the prediction in the "cutoff" field. The listbox preceding the "cutoff" field selects which prediction the cutoff is applied to.

To reuse the prediction results in an external program, it is possible to retrieve the predictions in a plain text format. To do this, choose "Text file" in the output format listbox.

This one is easy. Click the submit button, and a result screen similar to the one below should appear.

This table displays the sequences and their names extracted from the user input. If no names were assigned by the user (which is only possible in FASTA format), the sequences are numbered in their input order (sequence 1, sequence 2, ...).

Each row in this table corresponds to one peptide prediction. The columns contain the allele the prediction was made for, the position of the peptide in the input sequences (in the format [Sequence #]: [Start Position] - [End Position]), the length of the peptide, the peptide sequence and the predicted scores. The table can be sorted by clicking on the column headers.

The three primary prediction scores are:

Proteasome cleavage - The scores can be interpreted as logarithms of the total amount of cleavage site usage liberating the peptide C-terminus. Obviously this should not be taken literally as it depends on a lot of other factors e.g. the amount of source protein degraded. However all other things being equal, the difference between two scores can thus be translated into a difference of amounts.
TAP transport - The TAP score estimates an effective -log(IC50) values for the binding to TAP of a peptide or its N-terminal prolonged precursors. It has been show that high affinity of a peptide translates into high transport rates. Note that the original reference used +ln(IC50) values. The calculation of the score remains unchanged.
MHC binding - The MHC binding prediction is identical to this one and described in more detail here. However, the output is not log(IC50) values, but -log(IC50) values. The sign change was introduced for consistency purposes: now all scores associate higher values with higher predicted efficiency.

In addition to the individual scores, two summary scores are calculated:

Processing - this score combines the proteasomal cleavage and TAP transport predictions. It predicts a quantity proportional to the amount of peptide present in the ER, where a peptide can bind to multiple MHC molecules. This allows predicting T-cell epitope candidates independent of MHC restriction.
Total - this score combines the proteasomal cleavage, TAP transport and MHC binding predictions. It predicts a quantity proportional to the amount of peptide presented by MHC molecules on the cell surface.

A detailed evaluation of the correlation between predicted scores and antigenicity of peptides is currently being conducted and will help to better interpret prediction results.

MHC-I processing predictions - Tutorial