Curation Manual

From Curation_Manual

Jump to: navigation, search

Contents

Use of this Manual

The following is a collection of notes from curation meetings with the Epitope Council and is a suggested guideline for capturing immunological data into I.E.D.B.’s Internal Curation System (IPS). The following format will be used in this manual to refer to aspects of IPS data entry:

  • The database is structured into categories with headings that contain several fields. Whenever referenced in this manual, [Field Categories] will be in brackets. If a Field Category is mentioned without specifying an accompanying Field, then the guideline applies to all fields within that category.
  • Fields in ICS will be underlined.
  • "Free Text" and Finder functions will be in quotations.
  • <Drop-down> selections from List Fields will be in angle brackets.

Important Note: Section headings highlighted in yellow reflect modifications from the previous version of the curation manual.

Purpose

The purpose of this manual is to ensure consistency and accuracy of literature based curation. All IEDB curators must:

1. Read the manual

2. Read the manual

3. Refer to the manual

4. Follow the manual

5. When experimental scenarios are encountered in the literature that do not conform to the guidelines found in the manual, these issues are to be discussed in the curation meeting to foster development of new guidelines or to adapt our current rules in order to accurately capture epitope related data.

Inclusion/Exclusion Criteria

All articles and epitopes must meet inclusion criteria in order to be included in the database.

Relevant Experimental Data

Experimental Data

The reference must report original and experimental epitope-related data.

  • Computer derived predictions without functional experimental data will not be included in the database.
  • Sequence analysis of defined epitopes will not be included in the database unless novel information is provided (e.g., identification of anchor residues).
  • Reviews and meta analysis will not be included in the database.
  • Data not shown is not curated.
  • Personal communication is not sufficient for curation.
  • When encountering data from previous publications which are included in figures or tables, the curator should determine if the previous record was curated and if it was not, the PMID should be sent to the Document Specialist to be included in the database.

Scope and Exclusions

The experimental data must fall within the scope of the database.

Relevant data includes:

  • MHC binding data
  • Epitope elution from MHC (Naturally processed MHC ligands)
  • T cell responses to an epitope (including NK T cells)
  • B cell/antibody responses to an epitope

Certain categories of experimental data will be specifically excluded from the database.

Exclusion: NK Epitopes

Epitopes that are recognized by Natural Killer cells (non-T cell) will be excluded from the database. Please note: NK T cell epitopes/ligands will be included (as noted in section (#Scope and Exclusions).

Exclusion: Non-Immunological Interpretation of "Epitope"

Experimental data describing "epitopes" in non-immunological contexts will not be included in the database. For example, the structures that are in contact in protein-protein interactions are sometimes referred to as an epitope. This interpretation of "epitope" will not be included in the database.

Exclusion: Epitope Tags

References discussing epitope tags utilized as a technical tool for immunoprecipitation, purification, and similar experiments will be excluded from the database.

Exclusion: Superantigen

References discussing superantigens in the context of peptide-MHC-super antigen complexes will be excluded from the database. However, B cell/antibody responses to superantigens (especially Staphylococcus enterotoxin B (SEB), a NIAID Category B priority pathogen) will be curated.

Important Note: In the event superantigen is used as part of an assay to stabilize the interaction of an epitope with MHC, the epitope studied is curatable and should be captured.

Exclusion: TCR Antagonism

Data related to TCR antagonism will not be curated. Author stated TCR antagonism will not be curated, however, TCR competition not specifically labeled as TCR antagonism by the authors will be curated. TCR antagonism used in an MHC binding inhibition assay will be captured as an MHC binding assay.

Exclusion: Antigen Processing

Data concerning the processing of antigens generated in order to study the effects of variables on processing rather than on the study of an epitope (i.e. the epitope is irrelevant) are not to be included in the database unless the identification of a novel epitope is demonstrated.

Exclusion: Adoptive Transfer

Assays involving adoptive transfer will not be curated at this time.

Epitopes Relevant to IEDB

Length/Mass Restrictions

Table 1. Length/Mass Restrictions
Epitope Class Uncuratable Epitopic Region (# residues) Epitope
(# residues)
B-cell > 5 kDa or 50 amino acids 12 - 50 1 ≤ x ≤ 11
Class I > 5 kDa or 50 amino acids 12 - 50 7 ≤ x ≤ 11
Class II > 5 kDa or 50 amino acids 16 - 50 7 ≤ x ≤ 15

The database will only include epitopes of less than 50 residues in either a linear or conformational sequence. If the epitope is non-peptidergic, the mass restriction is to be less than 5000 Daltons to be included in the database.

Important Exception: Epitopes greater than 50 residues will be curated for certain pathogens including Botulinum toxin and anthrax epitopes.

A region or fragment of >50aa from B. anthracis and C. botulinum will be curated as an epitopic region in the following cases:

Figure 1.  Curation of Botulinum toxin and Anthrax epitopes.
Figure 1. Curation of Botulinum toxin and Anthrax epitopes.


Well-Characterized Epitopes

To broaden the spectrum of information in the database, we currently exclude the repeated curation of epitopes once 10 key references have been included in the database. The original articles describing the epitope, MHC restriction data, antibody responses, and articles containing novel information regarding the epitope will be included in the approximately 10 references. A compiled list of "well-characterized" epitopes is listed in Table 2 and can also be found in the Curation Network folder (\\Curation\CurationNotes\blacklisted.xls).

Table 2. Well-characterized epitopes
# Common Name Sequence Positions Source Species Source Protein Name Restriction Allele
1 TT Universal Helper epitope QYIKANSKFIGITE 830-843 Clostridium tetani Tetanus toxin DRB1*1302
2 OVA 257-264 SIINFEKL 257-264 Gallus gallus (Chicken) Ovalbumin Kb
3 Ova 323-339 ISQAVHAAHAEINEAGR 323-339 Gallus gallus (Chicken) Ovalbumin H2-Ag7, RT1.B1
4 HEL 46-61 NTDGSTDYGILQINSR 46-61 Gallus gallus (Chicken) Hen Egg white Lysozyme H-2 IAk
5 HBV core 18-27 FLPSDFFPSV 18-27 Hepatitis B Core Protein A2
6 SL9 SLYNTVATL 77-85 HIV Gag-p17 A*0201
7 TAX 11-19 LLFGYPVYV 11-19 HTLV Transcriptional activator (tax) A*0201
8 SYFPEITHI SYFPEITHI 367-375 Human Tyrosine kinase JAK1 Kd
9 MART-1(27-35) AAGIGILTV 27-35 Human MART-1 (Tumor antigen) A*0201
10 NY-ESO-1 epitope SLLMWITQC 157-165 Human Cancer/testis antigen NY-ESO-1 A*0201
11 Tyrosinase 370D YMDGTMSQV 368-376 Human Tyrosinase (Tumor antigen) A*0201
12 gp100 210M IMDQVPFSV 209-217 Human Glycoprotein (melanocyte lineage-specific antigen) A2
13 HER-2/neu 689-697 RLLQETELV 689-697 Human HER-2/neu (Tumor antigen) A*0201
14 HER-2/neu 369-377 KIFGSLAFL 369-377 Human HER-2/neu (Tumor antigen) A*0201
15 PLP 136-151 HSLGKWLGHPDKF 136-151 Human Myelin proteolipid protein H-2 IAs
16 MBP Ac1-11 Ac-ASQKRPSQRSK 1-11 Human Myelin Basic protein H-2 IAu
17 CMV pp65 NLVPMVATV 495-503 Human cytomegalovirus Phosphoprotein 65 (pp65) A*0201
18 M1 GILGFVFTL 58-66 Influenza Matrix Protein A*0201
19 Flu HA 307-319 PKYVKQNTLKLAT 307-319 Influenza Haemagglutinin DRB1*0101, DRB1*0401 (DR4Dw4), DRB1*0701, DRB1*1101, DRB5*0101(DR2a)
20 PA 224-233 SSLENFRAYV 224-233 Influenza Acid Polymerase H-2 Db
21 NP 366-374 ASNENMETM 366-374 Influenza Nucleoprotein H-2 Db
22 NP 147-155 TYQRTRALV 147-155 Influenza Influenza Nucleoprotein H-2 Kd
23 HA 110-120 SFERFEIFPKE 110-120 Influenza Haemagglutinin H-2 IEd
24 LCMV gp33 KAVYNFATC 33-41 LCMV Glycoprotein Kb, Db
25 LCMV gp 276 SGVENPGGYCL 276-286 LCMV Glycoprotein Db
26 LCMV np 396 FQPQNGQFI 396-404 LCMV Nucleoprotein Db
27 LCMV np 118-126 RPQASGYM 118-126 LCMV Nucleoprotein H-2 Ld
28 LLO 91-99 GYKDGNEYI 91-99 Listeria monocytogenes Listeriolysin O H-2 Kd
29 LLO 215-226 SQLIAKFGTAFK 215-226 Listeria monocytogenes Listeriolysin O H-2 IEk
30 LLO 190-201 NEKYAQAYPNVS 190-201 Listeria monocytogenes Listeriolysin O H-2 IAb
31 P60 449-457 IYVGNGQMI 449-457 Listeria monocytogenes p60 H-2 Kd
32 P60 217-225 KYGVSVQDI 217-225 Listeria monocytogenes p60 H-2 Kd
33 f-MIGWII MIGWII 1-6 Listeria monocytogenes LemA H2-M3
34 NANP NANP Plasmodium falciparum (malaria) Circumsporozoite protein
35 MCC 88-103 SYIPSAEKI 252-260 Plasmodium berghei Circumsporozoite protein H-2 Kd
36 MCC 88-103 ANERADLIAYLKQATK 88-103 Macrobrachium malcolmsonii (moth) Cytochrome c H-2 IEk
37 Hsp 234-252 LREAAEKAKIELSSSQSTS 234-252 Mycobacterium tuberculosis Heat shock protein (HSP) 70 RT1.B
38 PCC 88-104 KAERADLIAYLKQATAK 88-104 Pigeon Cytochrome c H-2 IEk

The above list plus any well characterized epitopes from additional sources encountered as the range of curated subjects is expanded are to be curated following the exceptions noted below. We avoid curating papers that focus on exploring strategies to enhance well characterized epitope's immunogenicity. These papers use well-characterized peptides to evaluate variables such as alternative immunogen constructs (MAPS/vectors) or adjuvants.


Important Note: When a well-characterized epitope is presented in the context of an epitopic region or domain, the longer peptide will be considered a well-characterized epitope.


Important Exceptions:

  • When well-characterized epitopes are studied in a novel host in terms of the MHC, TCR, antibody, or otherwise, the reference will be included in the database.
  • When well-characterized epitopes are used as controls they will not be curated. However, when the well-characterized epitope is used as a reference for a novel epitope from the same source protein, the well-characterized epitope will also be captured.
  • When a reference describes a well-characterized epitope in addition to other curatable epitopes, all of the epitopes from the reference will be captured.
  • At the curators/EC discretion, papers describing new/additional original data relating to these epitopes can and will be curated. For example, if the paper make a novel discovery about the epitope.

HIV/SIV

References describing only epitopes derived from HIV/SIV will not be included in the database. However, when a manuscript describes epitopes from other relevant sources as well as HIV/SIV, all of the epitopes in the manuscript will be captured.

Imported Data

When curating data previously imported from other databases, the curator should apply all curation rules and recurate any data as needed in order to comply with IEDB rules. This includes entering new epitopes, deleting epitopes not conforming to our criteria, and adding any new contexts as needed. References which originate from an import should be labeled as such on the reference page Each imported epitope should also be labeled as originating from a data import. For example, a phrase such as "Data originally imported from the HLA Ligand Database" should be entered whenever applicable.

Epitopes entered through direct submission will not be recurated, rather the publication regarding the directly submitted data will be curated as a new reference. Knowledge of the directly submitted data can be helpful in gathering epitope sequences and checking epitope structural data.

Curation Prioritization

Epitope curation should be conducted in the following priority order:

A) NIAID Category A, B, and C priority pathogens and toxins:

The complete list of NIAID Category A, B, and C priority pathogens and toxins can be found at the following URL: http://www2.niaid.nih.gov/biodefense/bandc_priority.htm.

NIAID – Category A

Bacillus anthracis (anthrax)

Clostridium botulinum

Yersinia pestis

Variola major (smallpox) and other pox viruses

Francisella tularensis (tularemia)

Viral hemorrhagic fevers

Arenaviruses

LCM, Junin virus, Machupo virus, Guanarito virus

Lassa Fever

Bunyaviruses

Hantaviruses

Rift Valley Fever

Flaviruses

Dengue

Filoviruses

Ebola

Marburg

NIAID – Category B

Burkholderia pseudomallei

Coxiella burnetii (Q fever)

Brucella species (brucellosis)

Burkholderia mallei (glanders)

Ricin toxin (from Ricinus communis)

Epsilon toxin of Clostridium perfringens

Staphylococcus enterotoxin B

Typhus fever (Rickettsia prowazekii)

Food and Waterborne Pathogens

Bacteria

Diarrheagenic E.coli

Pathogenic Vibrios

Shigella species

Salmonella

Listeria monocytogenes

Campylobacter jejuni

Yersinia enterocolitica)

Viruses (Caliciviruses, Hepatitis A)

Protozoa

Cryptosporidium parvum

Cyclospora cayatanensis

Giardia lamblia

Entamoeba histolytica

Toxoplasma

Microsporidia

Additional viral encephalitides

West Nile Virus

LaCrosse

California encephalitis

VEE

EEE

WEE

Japanese Encephalitis Virus

Kyasanur Forest Virus

NIAID – Category C

Tickborne hemorrhagic fever viruses

Crimean-Congo Hemorrhagic fever virus

Tickborne encephalitis viruses

Yellow fever

Multi-drug resistant TB

Influenza

Other Rickettsias

Rabies

Severe acute respiratory syndrome-associated coronavirus (SARS-CoV)

B) Emerging and Re-emerging pathogens:

Pathogens newly recognized in the past two decades

Acanthamebiasis

Australian bat lyssavirus

Babesia, atypical

Bartonella henselae

Ehrlichiosis

Encephalitozoon cuniculi

Encephalitozoon hellem

Enterocytozoon bieneusi

Helicobacter pylori

Hendra or equine morbilli virus

Hepatitis C

Hepatitis E

Human herpesvirus 8

Human herpesvirus 6

Lyme borreliosis

Microsporidia

Parvovirus B19

Coronaviruses/Severe Acute Respiratory Syndrome (SARS)

Re-emerging Pathogens

Enterovirus 71

Prion diseases

Streptococcus, group A

Staphylococcus aureus

  • Coccidioides immitis

C) Transplant rejection antigens and other alloantigens, Allergens, and Self antigens involved in autoimmunity

D) Infectious diseases not listed above under sections (#Curation Prioritization) A) and B)

E) Epitopes associated with cancer

Minimal Data Requirements

The reference must contain information for all required fields for at least one epitope in order to be included in the database. There are five major categories containing twenty-six fields. One set of data from each category must be available. These fields are highlighted in yellow in the data dictionary available in the Curation Network folder: (\\Curation\Docs\DataDictionary\). The required fields are listed in Table 3

Table 3 Required Fields
# Section Classification Field Name Comments
1 a   Reference -Journal Article PubMed ID At least one set of fields from Category # 1 (1a, 1b or 1c) has to be filled out.
  b i Reference - Submission Author(s)
    ii Reference - Submission Affiliation(s)
  c   Reference - Patents Paten Publication Number
2 a   Epitope Structure Linear Sequences At least one of the three fields from Category # 2 (2a,2b or 2c) has to be filled out.
  b   Epitope Structure SMILES Structure
  c   Epitope Structure Conformational Sequence
3 a   Epitope Structure Epitopic Region / Domain Mandatory field. This Boolean field indicates whether the epitope that is captured is a minimal epitope or contained within a region / domain.
4 a   Epitope Source Epitope Source Nature At least one of the seven fields from Category # 4 (4a, 4b, 4c, 4d, 4e, 4f or 4g) has to be filled out. If the value of natural Antigen, which is a Boolean field, is ’no’, all other Epitope-Source fields are ignored.
  b   Epitope Source Source Species
  c   Epitope Source Gene Name
  d   Epitope Source Protein Name
  e   Epitope Source GenBank ID
  f   Epitope Source Swiss Prot ID
  g   Epitope Source PDB ID
5 a i MHC Binding MHC Allele At least one set of fields from Category # 5 (5a, 5b, 5c, 5d) has to be filled out. All the fields in a subsection has to be filled out if that subsection is selected. For example, if 5a is chosen, all three fields (5a-i,ii,iii) have to be filled out. The following fields - Assay Type, and Qualitative Measurement, can be entered as "Unknown" if the data is unavailable. It’s anticipated that most the data imports from other existing databases might not have the assay related fields
    ii MHC Binding Assay Type
    iii MHC Binding Qualitative Measurement
  b i MHC Ligand Elution MHC Allele
    ii MHC Ligand Elution Assay Type
    iii MHC Ligand Elution Qualitative Measurement
  c i T Cell Response - Assay MHC Allele
    ii T Cell Response - Assay Assay Type
    iii T Cell Response - Assay Qualitative Measurement
  d i B Cell Response - Assay Assay Type
    ii B Cell Response - Assay Qualitative Measurement


Epitope Structure Availability

In the event the exact epitope structure is not given in the reference, follow these guidelines:

I) Contact the corresponding author(s) using the template contact letter provided in the Curation Folder based upon information given in the manuscript. In the internal Curation Tracking System (CTS):

A) Fill out F2: Status should be "Waiting for author’s response". Enter author’s e-mail address and date of e-mail in Comments section of F2 form.
B) Once author provides structure(s): Complete the curation of the article, including "Provided by author" in the Data Location Field in Epitope Structure.
C) Update F2, adding comments to include by whom and when the information was sent.

II) If an e-mail address is not provided in the article, a reference cited by the manuscript regarding the epitope structure may be used for sequence information. The [Epitope Structure] Data Location Field should state where the information was found as cited reference or author communication.

The following format should be used anytime literature is cited in the IEDB: Sette et al. (2007). Nature 48: 1141-7. [PMID:12345678]

When citing epitope location, “Reference cited” should be used in location field and citation information should be placed in comments. Provide the PMID whenever it can be easily obtained.

III) If doubts persist regarding the epitope structure, the data cannot be included in the database. If the author does not respond within two weeks, the reference will be deemed uncuratable. Curation status in Form F2 should be selected as <Uncuratable: Reference Scan> with comments including the reason(s) why the epitope/reference was deemed uncuratable.


Important Note: Cited References The same guidelines are to be applied when researching context information such as the generation of T cell clones or monoclonal antibodies. When researching clone production and mAb generation, only go to references cited by the paper that you are curating. If those references do not provide the needed information, do not continue to search further. Add an immunization comment stating that the details were not provided and clarify this point to the reviewer on the cover sheet, letting the reviewer know that you did look at the references, but still could not find the needed information.

How to Curate

General Considerations

The Curation Notes focus on detailed rules for handling data related to specific topics such as epitope structure, B and T cell responses, and negative data. When approaching a curation, one must take into consideration the main conclusions of the paper as revealed in the title and abstract. Rather than being buried in the comments, the main conclusions should be clear in the database and easily available to the user in a search. Be careful to avoid omission of critical data by blindly following curation rules. Be true to the integrity of the data and always consider the end user’s perspective.

Prevailing Rules

Certain database rules supersede all other criteria. Consider the full scope of the data prior to curation.

  • Due to the broad specificity of MHC molecules, all MHC binding data is captured. All MHC data receive separate entries and are not candidates for bulk curation.
  • All naturally processed epitope data is captured. All naturally processed data receive a separate entry and are not a candidate for bulk curation.
  • Do not capture positive or negative controls
  • Do not repeat data. If the database fields are used properly, each type of data should be entered only once, in its proper field.
  • Comments are to only be used when needed to clarify the information present in the other fields, not to repeat the information present in the other fields.

Comments

There are comment fields throughout the database in order to give the curator the ability to capture VITAL information that would otherwise not be captured by the available fields of the database. The following guidelines should be used when entering comments.

Comments are not required and should always be as brief as possible. Comments should only contain information specific to the pertinent field. In other words, the Comments on Assay field should provide only details pertinent to the assay, its results, or interpretation.

  • Comments should stand alone and be self-explanatory. The database user should not need to obtain the actual reference in order to understand the comments. Do not refer to figures in a manuscript in the comments fields unless the accompanying comment remains self-explanatory even if the data location were to be removed. The preferred format for alluding to other figures or tables is in parentheses or braces at the end of the sentence.
  • Be tactful and neutral. Refer to any unavailable or unclear information in an inoffensive manner. The fact that particular information is not available in the manuscript does not require mention unless it would be confusing for the end user otherwise. These fields are not meant to explain curation strategies to reviewers, but to convey critical information to end users.
  • Avoid the use of uncommon abbreviations. All users should be able to understand the comments. Acceptable abbreviations used in the database will conform to a standard abbreviation list compiled from five journals (J Immunol, PNAS, J Virol, Eur J Immunol and Cancer Immunity). A compiled list of standard abbreviations can be found in Appendix (Section (#Standard Abbreviations) )
  • Comments should be written in proper American English to ensure consistency and allow for standardized querying. Check spelling and grammar and use complete sentences (noun and verb). Remember to use articles such as of and the. Capitalize the first letter of the first word in a sentence and only capitalize appropriate words in the text (T cell, M. tuberculosis). Sloppy writing reduces the credibility of the database. Do not use contractions.
  • Do not describe routine details of the assay. These fields are not meant to describe the experimental protocol, but rather to clarify what is present in the other fields and are to be used only when needed.
  • Do not repeat the information already captured in other fields.
  • Avoid interpretation of the manuscript. It is preferable to paraphrase author’s interpretations or conclusions from article text.

Specific Fields

Epitope Structure

This section captures the molecular structure of an epitope, which is defined as the structure interacting with receptors of the immune system (T cell, B cell/ antibody, MHC).


Analogs

Analogs are synthetic constructs of peptide sequences or chemical compounds that share some structural features in common with another sequence or compound. They are often used to determine the role of specific amino acids in the binding or immunogenicity of an epitope. The source of an analog is always artificial.

Analogs are not captured as separate epitope entries, but are referenced in Comments on Assay fields or entered as a context of the wild type sequence UNLESS they are used in MHC binding contexts or are used in assays that do not involve the wild type sequence present in the form of epitope, source antigen, or source species used as either the antigen or the immunogen. See sections (#Key Residues) or (#Linear Epitopes) for further instructions on strategies for curation of many commonly encountered analogs.

All analog sequences accompanied by MHC binding data or sequences that are designed to improve upon the immunogenicity of an epitope must be entered as separate epitope entries. Additionally, all contexts that do not use the natural epitope as either the immunogen or antigen, but instead use only the analog sequence will also require the analog be entered as a separate epitope entry. An example of such an assay would be the use of a CTL line that was raised to the analog lysing APC presenting the analog.

Be sure to comment in the Epitope Comments field what wild type source the analog was derived from in order to retain some source information for the artifical epitope.

Modified Amino Acids

The Post-Translational Modification Type field is used to describe modified amino acids (naturally-occurring, post-translational modifications, or chemical/synthetic modifications). Here is an example for an N-formylated peptide:

Figure 2.  Describing N-formyl Peptides (PubMed ID: 11145694).
Figure 2. Describing N-formyl Peptides (PubMed ID: 11145694).

Do not use designate the formyl-methionine residue as "fM" as shown in Figure 2 since it conflicts with our use of lower case symbols to refer to D-amino acids. Instead, enter the N-formylated peptide sequence information into ICS as illustrated below (Table 4 ):

Table 4. Example for Entering Modified Amino Acids
Field Text
Linear Sequence MIGWII
Modification Type <Formylation |FORM>
Modified Sequence/Residue M1

In the Modified Sequence field, use the single letter amino acid code followed by the position of the residue in the epitope at it is captured in the Linear Sequence field. At present ICS does not accommodate the entry of multiple modification types. When a reference describes multiple modification types, consult with a senior curator or Epitope Council member to complete the curation. IEDB conforms to SWISSPROT when capturing post-translational modification types (http://www.expasy.org/tools/findmod/findmod_masses.html). For other modification types, it is added to the drop-down as specified in the reference after consultation with a senior curator.

Important Note: In the event a reference describes the use of an unmodified epitope and additionally that same epitope in a modified form (glycosylated, ubiquitinated, etc), the modified epitopes are to be curated following the rules used to curate analogs (#Analogs). They will not be captured as a separate epitope entry unless the criteria mentioned in (#Analogs) is met.

Important Note: All modifications are to be entered via the Modification Type field and not in the SMILES Structure field.

Important Note: Do not guess the correct modification to use. If the exact term that the authors used is not present, refer to a senior curator regarding synonyms for modifications or the need to add new terms to the drop down list.

Mimotopes

Mimotopes are functional mimics of natural molecular structures which bear little or no sequence homology to their biological counterparts. Mimotopes should be captured as separate epitope records. Follow the guidelines below to capture mimotopes. If a biological homolog is identified from the mimotope structure and is tested, the naturally occurring sequence should also be captured as a separate epitope context.

  • Enter the mimotope sequence or structure in the appropriate Sequence fields (Linear Sequence, Conformational Sequence or SMILES structure).
  • Specify "Yes" in Author Identified Mimotopes field.
  • In the [Epitope Structure] Comments field record the name or give brief information regarding the naturally occurring structure that the mimotope mimics.
  • If the mimotope has a natural source antigen, specify those details using the [Source Antigen] fields.
  • If the mimotope does not have a natural source, select the [Source Antigen] as either No Natural Source or Phage Display from the source finder. <Phage display library> has the accession number of SRC-1627.

Amino Acid Configuration

The amino acids recorded in Linear Sequence and Conformation Sequence fields are assumed to be in the L-amino acid configuration by default. If a D-amino acid configuration is reported, record the corresponding D-amino acids in "lower" case (SIINFeKL instead of SIINFEKL for a D-glutamate residue) and mention in the Epitope Comments field that the amino acids in the lower case are in the D-amino acid configuration.

Key Residues

When critical residues for the recognition of a linear epitope by T cell or B cell receptors, antibody, or MHC are described in a reference, this information is context dependent. Use the Assay Comments field to describe the key residues. The experiments identifying key residues of linear epitopes are not captured as separate contexts.

Important Exception: When conformational epitopes are defined in a reference by only key residues, the key residues should be entered in the Conformational

Sequence field. Thus, in assays identifying a single amino acid as a key residue, that amino acid will be entered in the Conformational Sequence field unless there is evidence that it is part of a linear epitope.

We assume that a mAb recognizing a conformational epitope defines the epitope, unless otherwise specifically stated by the authors.

Linear Epitopes

When alanine-scanning mutagenesis or other residue substitutions are used to determine key residues within a linear epitope sequence, the key residues should be captured in the Comments on Assay field.

The sequences utilized to deduce key residues are not captured in separate contexts. Rather, in the Comments on Assay field, capture in one or two short and concise statements how the residues were determined to be key. Use standard amino acid notation to denote key residues (Amino acid one-letter code and its residue number e.g. L107). The principal prerequisite for capturing epitopic determinant data is the demonstration of antibody or TCR binding to the native epitope sequence with appropriate controls. All MHC binding data and all naturally processed data are always curated, even when used to deduce key residues.

Figure 3 (pasted from reference with PubMed ID: 15213134) shows an example of alanine substitution data that should not be curated as separate contexts, but may be summarized as a comment. The Epitope Comments field should state "Alanine substitutions for residues F295, T300, Y301, and Y302 eliminated T cell activation, identifying these as critical amino acid residues in the epitope".

Conformational Epitopes

When capturing conformational epitopic determinants, Select <Discontinuous> in the Continuous sequence ? field and then enter the residues into the Conformational sequence field utilizing standard amino acid notation. If the positions listed by the author for the involved residues do not match the positions of those same residues in the Swiss-Prot source, enter the Swiss-Prot numbering along with the residues utilizing standard amino acid notation in the Swiss-Prot Positions field.


Important Note: It is important to always check the source cited for discontinuous epitopes to determine if the residues you are entering do indeed occur and at what positions they occur in the source.

The primary data to capture is a demonstration of antibody binding to the native sequence in non-denaturing assays (ELISA, X-Ray Crystallography, NMR).

As with linear epitopic determinants, the peptide sequences of all mutations or variants tested are not captured as individual epitopes. Enter relevant details regarding the variants or mutants in the Comments on Assay field and information regarding the determination of the epitope structure in the [Epitope Structure] Comments field.

Viral Escape Mutations

When entering viral escape studies into the database, the emphasis should be on antibody binding to residues of the wild-type sequence.

Binding of a monoclonal antibody (mAb) or antisera to the wild-type sequence must be demonstrated along with a loss of binding to escape mutants in order for the data to be included in the database. The negative binding data of the mAb or antisera to escape mutants is not curated.

Each mAb should be treated as though it defines a separate epitope, unless explicitly stated otherwise by the authors. If the panel of monoclonal antibodies is used to characterize binding to an antigen, follow the guidelines described in section (#Panel of Monoclonal Antibodies).

Multiple amino acid residues identified by a series of binding experiments may be captured as a single entry in the Conformational Sequence field if identified as belonging to a single epitope by the authors. Individual binding experiments (inhibition, neutralization, ELISA) will be captured as separate contexts of the conformational epitope.

It is not necessary to capture the substituted amino acid residues found in mutants; however, this information can be entered in the Comments on Assay field.

Information regarding the method used to determine the epitope sequence should be summarized in the [Epitope structure] Comments field.

Figure 3.  Experimental data demonstrating critical amino acid residues of an epitope are not curated as separate contexts.
Figure 3. Experimental data demonstrating critical amino acid residues of an epitope are not curated as separate contexts.


Deduced Epitopes

A deduced epitope is one that is not directly tested as an isolated structure , but rather deduced by the authors by various methods such as the use overlapping peptide scans. Deduced epitopes may be curated, however, those epitopes defined only by computer prediction algorithms in the absence of validating experimental data will not be included in the database. Follow the flow chart depicted in Figure 4 to determine how to curate deduced epitopes.

Additionally, when curating the use of multiple peptides (as antigen) in order to deduce an epitope, only one context should be curated using the optimal or minimal peptide as the antigen with a comment regarding the use of multiple peptides.

Minimal Epitopes vs Optimal Epitopes

The minimal epitope is the shortest length or smallest structure that produces a cellular or humoral response (is immunogenic) or can serve as an antigen (is antigenic). For MHC restricted responses, a minimum of 7aa are required for a sequence to be considered an epitope (see guidelines in section (#Truncation)).

The authors may describe the epitope with the greatest response as the optimal epitope. There are cases where the minimal epitope does not necessarily give the optimal response. This situation is common with Class II and Antibody/B cell epitopes.

In the event the minimal epitope is not the optimal epitope, the optimal epitope will be captured instead of the minimal epitope. For example, in Figure 5 (PubMed ID: 15048720) the optimal responses in panels a and c are produced by the longer peptides (represented by diamond symbols) and thus are not the minimal epitopes. In this instance, the optimal epitopes in panels a and c will be captured instead of the minimal epitopes.

Minimal Epitope vs Fine Specificity

Fine specificity refers to the detailed pattern of reactivity of different T cell clones or monoclonal antibodies when recognizing the same epitope due to changes of the components such as individual amino acids or sugars within an epitope. Typically, multiple T cell clones and monoclonal antibodies will recognize different amino acids as key residues within the same stretch of the protein sequence. Follow the guidelines below when there are fine specificities reported for a group of T cell clone or monoclonal antibodies.

  • Class I epitopes: if T cell clones respond optimally to amino acid sequences of different lengths in a truncation analysis, each of these sequences will be entered as separate epitopes, even though they may contain the same core sequence.
  • Class II epitopes: if multiple T cell clones respond to an amino acid sequence that satisfies the criteria of an epitope (e.g. 15 amino acids or less), but differ in their individual reactivities within that stretch of amino acids (e.g. clone 1 recognizes aa 1-13 and clone 2 recognizes aa 3-15), then the longer sequence containing all of the residues recognized by all of the clones will be entered as a single epitope (e.g .aa 1-15). This is considered a single epitope with different fine specificities.
  • B cell epitopes: Linear Antibody/B cell epitopes will be entered in the database similarly to Class II epitopes as above. For conformational Antibody/B cell epitopes, assume that each mAb defines its own epitope unless otherwise indicated by data or stated by the author.

These guidelines will be overruled when the authors consider the epitopes distinct, in which case they will be entered as individual epitopes.

Figure 5.  Example for curating minimal epitope vs optimal epitope.
Figure 5. Example for curating minimal epitope vs optimal epitope.

Epitopic Region/Domain

The Epitopic Region/Domain field indicates whether the epitope is a minimal epitope or is contained within a region or a domain. The choices are Defined Epitope, Epitope Containing Region/Antigenic Site, and Residues Involved In Recognition.

Define the epitope as a defined epitope if the authors state that it is defined, optimal, or minimal. If the authors do not specify whether the epitope is minimal or not, then use the Table in Section (#Length/Mass Restrictions) and also the guidelines developed by the IEDB Epitope Council below to determine the correct designation:

  • For linear Antibody/B Cell and Class I epitopes, if the sequence is 11 residues or less in length (with a minimum of 7 aa for class I epitopes, see (#Truncation)), the epitope should be designated as a Defined Epitope. If the sequence is 12 residues or greater, Epitope Containing Region/Antigenic Site should be selected.
  • For linear Class II epitopes, select Defined Epitope if the sequence is 15 residues or less in length. Select Epitope Containing Region/Antigenic Site if the epitope sequence is 16 residues or greater in length.
  • For nonlinear/discontinuous epitopes and the curation of key residues, select Residues Involved In Recognition, Epitope Containing Region/Antigenic Site or Defined Epitope as applicable.
  • For epitopes with both Class I and Class II contexts or both B cell and T cell contexts, define the epitope under the positive context. In the event both B and T cell assays are positive, define the epitope under the T cell guidelines.

Important Note: The size criteria used to determine epitopic region designations apply regardless of the qualitative data present for the epitope. For example, if only negative data is present in the reference for a particular structure, the designation as Defined Epitope will still be used if the size criteria are met. The external database will define structures having no positive contexts as "distinct structures" rather than "distinct epitopes".

Important Note: Residues involved in recognition are not captured when they are involved in the structural features of the recognition. That is, if a mutation results in a significant structural change, this mutation cannot be assumed to be involved in recognition by the antibody or TCR and is not captured as an epitope.

Ambiguous Cases for Designating a Sequence as an Epitope

Typically the assignment of a sequence as an epitope is unambiguous. However there are a number of scenarios in which more than one structure may be assigned as the epitope within the context of a single assay. Accordingly there may be ambiguity in deciding which structure is to be considered the epitope.

  • The sequence or molecular structure of an epitope may be shown to be conserved in several different antigens and in different species. The database only allows reference to one source antigen/source species per epitope entry. In order to enter multiple source species, multiple epitope entries are required.
  • Cross-reactivity studies may utilize an epitope molecular structure conserved across multiple source antigens or species; however, only reference to one source may be used per epitope entry in ICS.
  • Experimental data might reflect the use of an immunogen and an antigen derived from different source antigens or species, creating confusion regarding whether the immunogen or antigen should be designated as the epitope under which the experimental data will be curated.

Use the following guidelines to determine the structure to be entered as the epitope:

Case 1: If both the immunogen and the antigen are naturally existing molecular structures or both are artificial and one of them is an epitopic region while the other is a minimal epitope, the minimal epitope will be the captured epitope and is entered in the [Epitope Structure] category.

Case 2: If the immunogen and the antigen are both naturally existing molecular structures or both are artificial and both of them are minimal epitopes, then both sequences will get captured in separate entries as epitopes. The end result of this is that each epitope will receive an identical copy of the assay context. Duplication of assay contexts is necessary to insure that both epitopes receive equal priority in the database.

Case 3: If either the immunogen or antigen is artificial, that is, it does not exist in nature while the other is a naturally existing molecular structure, the natural structure will be designated as the epitope and entered in the [Epitope Structure] category. The artificial antigen or immunogen will be specified in the context (as assay antigen or immunogen) of the curated natural epitope.


Important Note: When an epitope sequence is present in more than one source and the other potential sources of the epitope are not specifically tested, but rather this information is only commented upon, the epitope is assigned to the source which was tested in the reference. In contrast, if several natural sources containing the epitopic sequence are studied in the reference, the epitope should be curated under separate entries for each source.

Conservancy/Cross-Reactivity

These guidelines determine the process used to curate data relating to conservancy and cross-reactivity.

This section is being reevaluated in light of all other rules regarding cross reactivity, novelty points, etc. New guidelines may soon appear.

Case 1: When an epitope is analyzed for conservancy or cross-reactivity among different natural proteins or pathogens with experimental data presented in the reference for each of the different sources, the different structures should be entered in the database as separate epitopes for each of the different natural proteins or pathogens according to section (#Ambiguous Cases for Designating a Sequence as an Epitope).

Case 2: When sequence or homology analysis of an epitope and/or source protein data is present, the significance of the data should be mentioned in the [Epitope Structure] Comments field.

Case 3: When an artificial peptide is used for conservancy analysis among multiple proteins / source species, only ONE [Epitope Structure]/[Source Antigen] pair is curated with all others recorded in the Comments on Assay field. The significance of the data will be used to determine the chosen epitope-source antigen pair.

Important Note: These cross-reactivity and conservancy rules are applied only to different species and NOT to different strains. Different strains are curated according to the guidelines specified in section (#Decision Scheme for Bulk Curation).

Epitope Source/Source Antigen

The natural source from which the epitope was derived is always entered as the [Source Antigen] for the epitope. For example, if a gene encoding for a peptide in the Hepatitis A Virus envelope protein is inserted into a Vaccinia virus vector, the [Source Antigen] of the epitope is the Hepatitis A Virus envelope protein rather than the Vaccinia virus vector.

The protein ID provided by the authors will be entered into the database. When the authors do not specify a sequence ID, these guidelines will be followed:

With an epitope formed by a continuous amino acid sequence, NCBI’s Protein BLAST is used to identify the sequence in the Swiss-Prot Database:

  • Use the "Search for short, nearly exact matches" tool under PROTEIN BLAST links found at http://www.ncbi.nlm.nih.gov/BLAST/
  • Enter the epitope sequence in the Search box and select <swissprot> under the Choose Database drop-down.
  • BLAST the sequence and find the exact match.
  • The Swiss-Prot ID from the GenBank file, found in the DBSOURCE tag will be entered into the database.
  • When an exact match is not available, BLAST the sequence against "Non Redundant" databases by selecting "nr" under the Choose Database drop-down. When entering the GenBank ID, the GI number is used.

Epitopes from Display Libraries

When an epitope is identified through the use of a bacteriophage, baculovirus, or other randomized peptide library, it must be determined whether the phage-derived epitope sequence is homologous to a sequence from a biological source.

  • If a natural homolog has been identified, then its sequence will be captured as the epitope.
  • If a corresponding natural homolog was not identified in the reference, then select the [Source Antigen] as Phage Display from the source finder. <Phage display library> has the accession number of SRC-1627.


Carbohydrate Epitopes

The source of carbohydrate epitopes can be vague or confusing at times as carbohydrates may be present across many species. Always capture the source as described by the authors. If the authors mention multiple potential sources, assign the most generic source which captures all of the mentioned species. For example, if multiple strains of the same species are mentioned, curate the source for the generic species. If different species of the same family are mentioned, curate the source as the generic family.

Source Species/Strain

The species name from which the epitope sequence originates must be entered into the [Source Antigen] Source Species Taxonomy field. Often the epitope sequence originates from an organism such as a virus, bacterium, eukaryotic organism, or, less often, a plant. In order to enter the species into this field a Find function is used.

Use the Find button to access the Species/Virus Finder and type the name of the organism into the blank box followed by clicking on the Search Taxonomy button. Scroll through the entries for the genus and species name until reaching the appropriate strain. Double-click on the strain in order to enter it into the Source Species Taxonomy field.

Important Note: When the NCBI Species/Virus Finder provides both the source species and the strain, that option will be entered into the Source Species Taxonomy field. The strain must also be re-entered in the Strain field.

When the strain given by the authors is not found in the Species/Virus Finder, the genus and species name will be entered into the Source Species Taxonomy field and the strain name provided by the authors must be entered into the Strain field.

When the strain of the source species is not specifically mentioned in the reference and all uses of the source species are performed with a particular strain, the epitope may be attributed to that strain, provided the exact epitopic sequence is found in that strain. For example, when the authors describe a Clostridium botulinum epitope and do not mention which type it was derived from, but always immunize and assay with Type A, the epitope may be assigned the source species of Clostridium botulinumType A.

The same rules apply for species and strain when entered in the fields under the [Immunogen], [Antigen], and [Carrier] categories.

Important Note: Remember that the exact sequence of all epitopes must be found within the source antigen to which they are assigned. In the event that the epitopic sequence cannot be found within the source the authors describe, an IEDB Source ID may be assigned. Follow the guidelines below to obtain and use IEDB Source IDs.

When and How to use an IEDB Source ID

Natural sequence or non-natural sequence?

First it is important to identify if the epitope is natural or unnatural according to the authors. If the authors’ state the sequence was derived from a naturally occurring source (for ex, sequenced from a patient), then whatever source (GenBank, SwiProt, or IEDB) is used, it should have a natural source. If the authors state that the sequence was artificially created or is a lab induced mutation or analog of a wild type sequence, then the epitope should have NO Natural Source.

What if a sequence is not found in SwissProt/Genebank databases?

If the epitope is natural, first try to BLAST the sequence in SwissProt, if no EXACT matches are found, then try a Genebank BLAST.

Important Note: if the manuscript provides a long peptide sequence that does not match by BLAST, be sure to BLAST only the epitope sequence rather than the entire peptide sequence.

If still no match can be found by BLAST, then scrutinize where the mismatch occurs and confirm that the sequence you entered in the BLAST is the exact sequence specified in the manuscript. If there were no errors on your part, then determine if a typo was likely. Sometimes when a scanned doc is converted to a pdf, errors occur in sequences, placing unusual characters in the place of residues. An author may repeat a single aa unintentionally, for ex, the author states SIIINFEKL instead of SIINFEKL. With very obvious typos, for ex, the sequence is correct in several places in the manuscript, but entered with a typo in one location, it is ok to enter the correct, BLAST-matching sequence as the epitope. However, if there is any uncertainty, the author MUST be contacted in order to clarify the EXACT sequence used in the assays. Follow the guidelines for author contact and CTS use as applicable. If the authors fail to respond within 2 weeks and/or you cannot obtain a reliable source for the corrected sequences, proceed to assign a IEDB source ID.

IEDB Source IDs

IEDB source IDs are a list of molecules that are used as the source antigen when the SwissProt/GenBank databases do not have a matching source for a given epitope or when the source is non-proteinaceous.

Natural IEDB Source ID

If you are certain your sequence does not match anything by BLAST AND the author states the ENTIRE sequence being entered into the epitope sequence field is natural, then you should obtain or use an IEDB Source ID with a natural source. First check if an IEDB Source ID exists with the correct source antigen name for your case and that your species is part of the higher taxonomical category group the IEDB source is assigned to. If so, use this ID, and in YOUR epitope source field select the appropriate species and strain as provided in the reference. All uses of IEDB Source IDs are subject to review so be sure that you are using it correctly. If the particular ID you need is not present, then contact Laura Z. to obtain an IEDB Source ID and continue your curation with a place holder source ID and a note on your cover sheet stating that the ID must be changed before promoting the reference.

Unnatural IEDB Source ID

In the event an epitope sequence is unnatural, such as an artificial analog of a natural sequence, use an IEDB Source ID with an artificial source: SRC2066 IEDB No Natural Source Artificial Peptide/Protein is the most common selection. Important Note: When a natural sequence repeated an unnatural number of times is to be entered as the epitope sequence (because the natural sequence is not used in the curated assays), it should be entered as artificial & thus have no natural source.

SRC1627 Phage display library is to be used when an artificial peptide is obtained through panning of a Phage display library. The use of this IEDB source allows grouping of peptides that differ in the methodological basis of their discovery from analogs based upon a natural sequence.

Important Note: When a library is based upon a natural sequence, Phage display library (SRC1627) should be selected as the source.

Obtain an IEDB Source ID

To obtain an IEDB ID, email LZ (laura@liai.org) with the following information: PMID, epitope name and/or epitope sequence, location, & details regarding if & when the author was contacted. Laura will determine the correct IEDB Source ID to generate and notify you once it has been added.



Quasispecies/Minor Species

In the event the epitopic sequence is demonstrated to mutate over time, each new sequence represents a new epitope, antigen, or immunogen. If the new sequence matches a SwissProt/Genbank entry, that source may be used as the epitope source antigen. If the variant sequence does not match an existing sequence, an IEDB source should be assigned.

Epitope Positions in Source Antigen

The positions of the epitope provided by the authors may not match the positions given by Swiss-Prot (or GenBank ID).

The author specified positions are always recorded in the Epitope Starting Position and Epitope Ending Position fields.

The position numbers provided by Swiss-Prot will be recorded in the Epitope Swiss-Prot Positions field in the format starting position-ending position (for example, 120-129) and will be used only if there is a discrepancy between author reported and Swiss-Prot/GenBank positions. This discrepancy may be mentioned in the [Source Antigen]Comments field.

Be sure to enter the exact starting and ending positions as the authors state, even when these values contain errors. This field is the AUTHOR specified field, not the real positions.

Important Note: Repeating Epitopes
When exactly the same epitope sequence occurs at multiple positions in the same source antigen and reactivity is not demonstrated to be site specific, the positions of the first occurrence will be entered in the position fields with comments regarding the repetitive nature of the sequence entered in the epitope Comments field.

MHC Binding

Experimental data characterizing the interaction between an epitope and an MHC molecule is entered under the tab labeled MHC binding.

MHC Allele(s) that bind the epitope should be recorded exactly as specified by the authors. When a specified MHC Allele is not available for selection through the Allele Finder tool, please contact a senior curator or team member to add an allele to the finder.

Alleles that are mutated in the binding region in order to study the importance of residues in binding are not curated, but are commented on under the natural allele context (if present). Alleles that are mutated outside of the binding region are curated as the wild type allele. For example, a chimeric allele will be captured as the wild type of the binding portion.

Naturally occurring mutations in alleles are curated and should be added to the Allele finder if not present.

When both α- and β- chains of HLA class II alleles are specified in the reference, both of them will be recorded. Consult with EC members if you have any questions relating to the curation of MHC Alleles.

Important Note: All experimental MHC binding data given in a reference will be entered in the database irrespective of whether the qualitative assessment is positive or negative. This may not be bulk curated under any circumstances. TCR antagonism used in an MHC binding inhibition assay will be captured as an MHC binding context.

Important Note: In the curation of MHC binding assays, the exact peptide sequence that is tested MUST be entered as the epitope. There are no carriers possible in these assay types.

Qualitative Measurement

The Qualitative Measurement field is a required field and accepts only <Positive> or <Negative> as its value. The determination of positive or negative binding is established by the authors. The following guidelines are used to record qualitative measurement.

Case 1: When authors specify a qualitative assessment in the reference as either positive or negative, their assessment will be recorded in the database.

Case 2: When a qualitative assessment can be inferred from the information in the manuscript (threshold is provided), this assessment will be entered into the database.

Case 3: When no qualitative assessment is provided by the authors, this data will not be entered into the database.

MHC Ligand Elution

Only experimental data in which the authors elute peptides or ligands from a cell or purified receptors such as MHC is entered here. The peptide or ligand is then detected in the eluate through sequencing (Mass Spectrometry/Edman Degradation) or by a specific T cell with a known recognition of that peptide. These methods are used in order to demonstrate that cells will naturally process an antigen and present the epitope on the cell surface or bound to a relevant receptor. This specifically excludes epitopes given directly in the assay.

The processing of artificial antigens created in order to study processing are not curated. The processing of analogs created in order to study processing of the wild type sequence is also not captured. Processing of natural antigens should always be curated.

Important Note: When curating MHC Ligand Elution contexts in which the antigen presenting cells are incubated with the source species of the epitope, the protein from which the epitope was derived, or a fragment of the source antigen, the antigen type should not be <epitope>, but rather the larger antigen from which the cells derived the epitope should be selected (source protein or source species that was processed).

When the origin of the eluted peptide is not specifically known, the following guidelines apply:
-epitope of viral origin is eluted, the antigen type is <source species>
-epitope of known self origin is eluted, the antigen type is <source protein>
-epitope of unknown origin is eluted, the antigen type is <other>

Qualitative Measurement

Section (#Qualitative Measurement) applies.

T Cell Response

Minimum Criteria for Curation

Presence of Epitope

In general, polyclonal data generated without using the epitope as either the immunogen nor the antigen is not entered in the database for that specific epitope.

Table 5 serves as a guide to determine curatable contexts of polyclonal responses.

Table 5. Curatability of polyclonal T Cell/Antibody/B Cell Contexts
Immunogen Antigen
Epitope Source Protein Source Species Other
Epitope Curate Curate Curate Curate
Source Protein Curate Do not curate* Do not curate* Do not curate*
Source Species Curate Do not curate* Do not curate* Do not curate*
Other Curate Do not curate* Do not curate* Do not curate*

Important Exception: When curating a deduced epitope, and neither the immunogen nor the antigen is the epitope, the assays performed to deduce the epitope should be curated and the antigen type should be either <Fragment of Source Antigen> or <Other>.

Curatability of Monoclonal Contexts (monoclonal antibodies and T cell clones): If the specificity of the monoclonal receptor is shown, known, or implied (for example, the T cell clone is generated by in vitro stimulation with the epitope) and the clone or mAb is tested for reactivity to source protein/species, or to a different antigen, this data can be curated for the epitope even though neither the immunogen nor the antigen is the epitope.

Pool of Peptides

When the antigen or immunogen is comprised of a pool of peptides and the result of the assay is negative, each peptide of the pool will be entered in the database (when provided by the reference). When an epitope is tested both as part of a pool and alone, those contexts may be bulked under the epitope used alone with a comment stating the same results were demonstrated through the use of a peptide pool.

When the response to a pool of peptides is positive, this data is generally not entered in the database but the de-convolution (testing of individual peptides within the pool) experiments are curatable, if available. However, there may be cases in which pools of overlapping peptides are used to define epitopic regions/domains without further de-convolution. If the peptide pool defines a continuous epitopic region/domain ≤ 50 amino acids, the data will be curated under the entire amino acid sequence encompassed by the peptide pool. The Epitope Comments field should explain that the epitope is actually comprised of a peptide pool.

TCR

TCR rearrangements Vβ chain type, etc) are not captured in the current database structure. However, such data should be described in the Comments on Assay field under the appropriate T cell response.

T Cell Phenotype

T cell phenotype data (upregulation of co-stimulatory molecules like CD45) are not captured in the current database structure, but can be mentioned in the "Comments on Assay" field under the appropriate T cell response.

Tolerization Data

Data related to tolerization is not entered into the database. TCR antagonism is also not captured in the current database structure. This feature will be considered for future releases.

Antigen/Immunogen Fields

These fields clarify the relationships between the data entered in the Epitope Structure fields and the Antigen or Immunogen fields.

Antigen/Immunogen Type

Antigen and Immunogen Types are provided in a drop down menu composed of the following:

Epitope-This is to be used when the exact epitope structure is being used as the antigen or immunogen.

Important Exception: When a carrier or vector is used with the epitope structure, "Epitope" is selected. When the antigen or immunogen is the epitope presented in a different chemical type, for example DNA rather than peptide, "Epitope" is also selected.

Important Note: When a peptide linker is added to an epitope in order to link it to a carrier, the antigen or immunogen is to be curated as <Epitope>, the linker residues should be entered into the Comments on Assay or Immunization Comments fields.

Source Protein-This selection is made when the complete source antigen of the epitope is used as either the immunogen or the antigen.

Source Species-This is selected when the exact species and strain from which the epitope is derived is used in the assay.

Fragment of Source Antigen-This is to be used when any naturally occurring fragment of the source antigen that contains the epitopic sequence and is larger than the epitope structure is used. This selection is also used to capture the study of recognition of degradation analysis and enzyme or chemically treated fragments of the source antigen.

Important Note: When a number of fragments of the source antigen are tested, these fragments may be bulked into one assay context. Curate the fragment giving the best result and comment on the other fragments that were used.

Other Species/Strain-This is selected when a source species (virus, bacteria, etc) other than the exact species and strain from which the epitope was derived is used as either the immunogen or the antigen.

Other-Any immunogen or antigen that cannot be described by any of the previous choices will be labeled as "Other". Common examples of this type include peptides originating from species or strains other than those the epitope originated from and analog peptides.

BLANK The Immunogen Type may be left blank in certain circumstances. These will occur ONLY when the Immunization category is selected as: Unknown, Cancer, Phage Display, Autoimmune, and No Immunization.

Important Note: Remember that when the immunization procedure is unknown or uncertain, the Immunization Category is selected as <Unknown> and the Immunogen Type is left BLANK.

Multi-epitope construct -This selection is not to be used. It will be removed from the list at a future date.

Peptide containing the epitope-This selection is made when describing unnatural peptide contructs containing the epitope and additional residues, but no other functional units (other epitopes). This is only used to describe an unnatural construct. Author statements, curator discretion, and curatability rules should be used to determine whether to capture the Imm/Ag as Epitope with mention of additional residues in the Comments field if the additional residues warrant capturing the Imm/Ag as Peptide cont the Epitope. Different carriers linked to the same antigen or immunogen may be bulked. Curate the carrier giving the best response and comment on other carriers that were used.

Important Note: The curatability of peptides containing epitopes depends upon the sequence being recognized. For example, if the immunogen is the source antigen or the source species (containing the natural epitope sequence) and the antigen tested is the natural epitope sequence plus several unnatural residues, then the recognized & thus curated epitope is the sequence found in the natural source with the additional residues added for the purpose of the assay being either commented upon or captured as a peptide containing the epitope as further described below. However, if an artificial construct composed of a natural peptide sequence plus several unnatural residues is used as both the immunogen and the antigen, then the entire artificial construct must be captured as the epitope and be given No Natural Source. Additionally, an epitope source comment should reflect the natural source from which the epitope was derived.

Important Note: A natural peptide sequence should never be defined as Peptide containing epitope.

Image:Slide1.jpg Image:Slide2.jpg Image:Slide3.jpg


Image:Linker.jpg

Important Note: When a peptide linker is added to an epitope in order to link it to a carrier, the antigen or immunogen is to be curated as <Epitope>, the linker residues should be entered into the Comments on Assay or Immunization Comments fields. Linker: A linker is a few residues used to LINK the epitope to another functional unit which may be another epitope, a MAP construct, a vector, etc. The linker is not entered as carrier. If the epitope is not linked to anything else, additional residues added to a natural epitope sequence are captured according to the above figure in either comments or captured as part of a Peptide Containing the Epitope.

Important Note: Regardless of the Imm/Ag Type selected and the particular use of the Carrier, Comments, and Adjuvant fields, no single component of the Imm/Ag should ever be repeated. That is, if certain residues are captured under the sequence of PeptContEpitope as selected for Imm/Ag, then those residues should not also be entered in the Carrier field. Likewise, residues captured in the Carrier field should not be entered in the Imm/Ag field. The same concept applies to the use of the Adjuvant field.


Important Note: In the curation of MHC binding assays, the exact peptide sequence that is tested MUST be entered as the epitope. There are no carriers possible in these assay types.

Antigen/Immunogen Chemical Type

Chemical Type DNA vs Pept/Protein

In all cases in which the immunogen or antigen is delivered in DNA form (plasmid DNA, naked DNA, DNA-coated beads/particles or DNA is delivered by an organism), the following guidelines apply: The Immunogen Type or Antigen Type fields will reflect the translated product (protein) of the DNA that was delivered in the immunization or used as the antigen in the assay, that is if the epitope was delivered in plasmid form, the Imm/Ag Type is <Epitope> and the Chemical Type field of the antigen or immunogen will be entered as <DNA>.

With the use of stably transfected cell lines expressing the epitope, the immunogen/antigen chemical type will be entered as the same as the epitope chemical type (protein). The cells expressing the epitope will be considered a carrier only when the cells expressing the epitope are used to immunize.

When a <multi-epitope construct> is used as an Antigen Type or Immunogen Type, only the corresponding [Antigen] or [Immunogen] Name fields must be completed. Details about backbone and/or other epitopes included in the construct are entered in the appropriate [Carrier] fields, including their sequences. This information is clarified in the appropriate Comment field.

Important Note: When the Antigen Type or Immunogen Type fields are entered as <Epitope>, <Source Protein> or <Source Species>, the remaining fields are autofilled. The curator should verify that all fields are accurate and may alter the chemical type and Imm/Ag Name if necessary.

Important Note: When a virus or cell expressing the epitope or antigen sequence is used as a vector, the information regarding the virus or cellular carrier is entered under the [Carrier] fields and the antigen/immunogen type is entered as the same type as the epitope.

Important Note: When the epitope or antigen is covalently bond to a vector, the information describing that structure is entered in the [Carrier]fields. When the epitope or antigen is not covalently bound to the vector, the information describing that vector is entered in the Formulation / Immunization fields.

Antigen Conformation Definition Field

This field describes the conformational type of the antigen used in a B cell assay as either native or non-native.

Native Select <Native> when the no alteration to the tertiary structure of the tested antigen has been made. Native conformation is commonly accepted as the biologically active form of the protein (or other chemical type). This also includes synthetic or recombinant peptides identified by the authors as having native conformation.

The following will be considered enough evidence for choosing Antigen Conformation Recognized = Native:

Antigen = Source Antigen

OR

Antigen = Source Species

AND

Assay Type = Neutralization Antibody dependent Cytotoxicity Assay Challenge Assay Cytopathic Effect Assay (CPE) Hemagglutination-Inhibition Calorimetry Colony Immunoblot Inhibition Assay

Non-native/unknown Select <Non-native/unknown> when the tested antigen is not in its native conformation. This would include short synthetic peptides and proteins that are deliberately denatured or denatured in their preparation. This value also includes antigens for which the physical nature of the antigen is neither stated by the author, nor decipherable from the paper (unknown).

Important Note: When authors’ state the conformation of the antigen is native, and it is reasonable to believe so, the conformation should be entered as Native, even when the above criteria are not met. Examples of such situations are when assembled viral particles or sporozoites are used in an ELISA and the authors’ specify recognition of native antigen.

Important Note: Assays having different antigen conformations may be bulk curated if the outcome is the same. Bulk curate under the native conformation.

MHC Fields

The following guidelines below are used to complete the MHC fields present under the T Cell Response and Naturally Processed Sections. Basically, if restriction is demonstrated for a T cell context, each restriction that is shown is curated separately as a new context with each allele listed once as the restricting allele. For example, if a CTL assay is done and restriction is known to be both HLA-A2 and HLA-A4, then two separate CTL assays will be curated, one with restriction of A2 and one with restriction of A4. Promiscuous binders are captured as multiple contexts if restriction is demonstrated in a manner to warrant the use of an evidence code. If the epitope is recognized by six alleles, for example, six contexts will be curated. When potential restriction is discussed without experimental evidence, restriction is not captured. The author's comments should be captured under assay comments and the MHC Types present field may be used. In cases where restriction is not fully demonstrated, but the MHC types of the responding population are known, these may be captured under MHC types present. The MHC types present field captures the common alleles shared by the responders among the group. Do not enter all MHC types tested, but rather, all MHC types shared by the population demonstrated to recognize the epitope. The purpose of this field is to add potential restriction information in cases where exact restriction is not yet known.

Figure 6.  Handling MHC Fields.
Figure 6. Handling MHC Fields.

Important Notes:

  • For Inbred animals with known fixed alleles (for example, inbred mouse strains BALB/C, C57BL/6), leave the MHC Types Present field under Immunized Species and MHC Types Present under Target Cell Species BLANK because these fields will be assumed standard and possibly autofilled in the future.
  • If a restriction is not specified but the response is shown to be either a Class I or Class II response, then enter only the Class I alleles or the Class II alleles in the MHC Types Present field under the most appropriate category, [Immunized Species] or [Assay: Target Cells: Source Species], according to Figure 6 .
  • In a population in which no allele(s) are shared by the population-leave the [Immunized Species] MHC Allele(s) Present field BLANK.
  • If there are common allele(s) but none are specifically implicated as the restriction, then curate as MHC Types present.
  • MHC restriction should be captured when the epitope has a known MHC binding motif and the antigen presenting cells express that allele.


MHC restriction Evidence Codes

The curator assigns MHC restriction based upon the following definitions and evidence codes. Only one evidence code may be selected at a time, therefore the code referring to the assay type which most narrowly defined the restriction of the epitope should be used. For example, if an antibody blocking assay was used to narrow the restriction to Class I and an assay using mismatched APC was used to define which Class I allele defines the restriction, the evidence code selected should be mismatched MHC molecules.

1)T cell assay indicating MHC restriction

  • T cell assay -Mismatched MHC molecules

Use of APCs expressing differing MHC molecules. The assignment of MHC restriction is based upon the use of a T cell assay in which animals or cells of differing MHC types are used to demonstrate MHC restriction.

  • T cell assay -MHC subset identification

The assignment of MHC restriction is based upon the use of a T cell assay in which antibody is used to block a MHC molecule or MHC molecule subset in order to demonstrate MHC restriction.

  • T cell assay -Single MHC type present

The assignment of MHC restriction is based upon only one MHC type being present in a T cell assay. For example, a CTL assay with APCs expressing a single type of MHC molecule.

  • T cell assay -Biological process measured

Biological process measured-applies only to class restriction. For example, assignment of Class I restriction for an epitope used in a CTL assay. This is applied only when clear indication of restriction is present.

  • T cell assay -T cell subset identification

The assignment of MHC restriction is based upon the use of a T cell assay in which antibody is used to block a subset of T cells, a subset of T cells is depleted, or a pure T cell population is used in order to demonstrate MHC class I or Class II restriction.

2)MHC binding assay The assignment of MHC restriction is based upon the results of a MHC binding assay.

3)MHC binding prediction The assignment of MHC restriction is based upon the use of a binding prediction based upon the sequence or structure of the epitope without experimental assay.

4)Statistical association The assignment of MHC restriction is based upon associations of the study population’s MHC expression and recognition of the epitope.

5)Cited reference The assignment of MHC restriction is based upon information provided in references cited by the author.


Effector cell assignment is made following these guidelines:

1.Phenotype identification- Direct demonstration of the effector cell phenotype will be used to assign effector cell type. For example, CD8+ staining of the population producing IFNg.

2.Cell Isolation –Isolation or purification procedures will be used to identify the cell type of the effector cells present in the assay. For example, the use of a cell population after CD8 depletion would be identified as CD4+ T cells.

3.Biological process measured-The response measured by the assay type will not be used to identify the cell type of the effector cells. For example, measurement of proliferation may be an indicator of CD4+ T cells, however, if splenocytes were used in the assay, the effector cell type should be entered as splenocytes.

4.MHC Restriction –MHC restriction of the epitope will not be used to assign the cell type of the effector cells used in the assay. That is, if PBMC are used in an assay utilizing a Class II epitope, PBMC will be entered into the effector cell field. However, assignment of specific MHC restriction or restriction to the level of Class I or Class II should be performed whenever possible and may be attributed to the assay type when the authors state or imply such.

Immunization

Immunization Category

This field describes all exposures of the immune cells under study both in vivo and in vitro from initial exposure until the time of the assay. When subjects are vaccinated, cells are infected or stimulated in vitro, the selection will be <Administration>. <Administration> implies that the initial contact of the immune system or cells is by deliberate and controlled exposure. All other Immunization Category selections imply that the initial exposure to an antigen is by natural biological processes, e.g. infection, cancer, allergens, or autoimmunity. The possible selections are described in detail below.

Important Note: The Immunization Category field is independent from the Disease state and the [Immunogen] fields.

<Administration>
The immunogen was administered for the purpose of immunization or stimulation either in vitro or in vivo.

Important Note: Select <Administration> only when the Immunogen is known. Do not select <Administration> when the Immunogen is unknown, even when you can safely assume administration of the immunogen was performed. All selections of <Administration> must have the Immunogen field filled in.

<Allergy>
Subjects have a naturally occurring allergy against the immunogen (allergen) prior to the study.

<Allergy plus restimulation>
Cells are taken from individuals with a prior allergy and restimulated In vitro.

<Autoimmune (No Administered Immunization)>
The immunogen (if known) originates from the host, such as with effector cells believed to be specifically reacting to a self antigen when derived from a subject with an autoimmune disease.

<Autoimmune plus restimulation>
Cells are taken from a subject with an autoimmune disease and restimulated in vitro.

<Cancer (No Administered Immunization)>
Effector cells are taken from a cancer subject and are believed to specifically recognize cancer antigens.

<Cancer plus restimulation>
In vitro restimulation is performed on cells obtained from individuals with cancer.

<Natural Infection or Exposure>
Subjects are naturally infected prior to the study.

Assumptions: For certain ubiquitous pathogens such as Influenza, CMV, EBV and Candida, all individuals are presumed to have natural exposure to the source species (when not clearly stated by the reference). Similarly, PPD positive individuals are assumed to be naturally exposed to mTB pathogen when not specified in the reference. When curating samples taken from endemic areas, the immunogen may be assumed based upon location, for example, in the case of PBMC collected from healthy donors living in a malaria endemic region, the immunogen will be captured as P falciparum.

<Natural Infection or Exposure plus restimulation>
In vitro restimulation was performed on cells taken from individuals who were naturally infected or exposed.

<Phage Display (No Immunization)>
Antibodies are obtained through the use of a phage display library. In this situation, all [Immunization] fields will be left blank.

<Unknown>
The immunization category is not specified in the reference. Select <Unknown> whenever the Immunogen is unknown. This can occur when an antibody is purchased from an outside lab or commercial company or a previously described cell line is used. All available information will be mentioned in the Comments on Immunization field.

Important Note: In cases with no immunization such as cancer, autoimmunity, or unknown natural exposure with additional in vitro restimulation, the restimulating antigen is entered as the immunogen.

Immunized Species

The species and strain or ethnicity of the immunized species is captured in this field. Different species cannot be bulked. Each new immunized species is curated as a new context. Different strains may be bulked in certain situations. When the outcomes of the assays are the same for B cell assays, different strains of mice may be bulked. All of the different strains should be entered in this field, separated by commas. Different strains may be bulked in T cell contexts if the outcome is the same and the MHC alleles are the same. The curator should use discretion and author statements to determine when different strains of animals should not be bulk curated.

Transgenic Mice

The following guidelines apply when the immunized species is a transgenic strain of mice.

  • The Species field will be "Mus musculus"
  • The Strain field will specify that it is a transgenic mouse (BALB/C A2 Transgenic)
  • Human alleles will be recorded in MHC Types Present field when the transgene contains an MHC allele; otherwise the guidelines in (#MHC Fields) will be followed.
Ethnicity

Ethnicity can be defined as "of or relating to large groups of people classed according to common racial, national, tribal, religious, linguistic, or cultural origin or background".

Ethnicity will be entered in the Strain/Ethnicity/Population field when the ethnicity is clearly stated in the text and it can be found in the "Population" or "Population Area" lists at:

http://www.ncbi.nlm.nih.gov/projects/mhc/ihwg.cgi?cmd=PRJOV&ID=9

Information regarding the ethnicity will be entered in the Comments on Immunization field when the ethnicity is not clearly specified in the text, does not appear in the "Population" list, or a geographic region is used to describe the origin of the patients involved (geographic location does not necessarily correlate with ethnicity).

Disease State

The Disease State field reflects the disease state of the individual(s) used in the context. When multiple diseases are reported in the reference, the one disease which is most appropriate to the context will be entered while the others will be noted in the Comments on Immunization field.

Diseases which are the result of an administered immunogen are only to be captured in the Disease State field when evidence of disease is demonstrated (clinical symptoms, pathological evidence, author stated, etc) and/or are relevant to the assay being captured.

Assumptions: For certain ubiquitous pathogens such as Influenza, CMV, EBV and Candida, all individuals are presumed to have natural exposure to the source species (when not clearly stated by the reference). Similarly, PPD positive individuals are assumed to be naturally exposed to mTB pathogen when not specified in the reference.

Important Note: When curating samples taken from endemic areas, the immunogen may be assumed based upon location, however, the the Disease State field is not used unless the author's indicate that the population has or had a specific disease. For example, in the case of PBMC collected from healthy donors living in a malaria endemic region, the disease state field will be left empty and the immunogen will be captured as P falciparum.

Disease Stage

The Disease Stage field reflects the stage of the disease at the time the experimental data was generated. The Disease Stage may be described as:

<Acute>
A short-term infection or disease characterized by a dramatic onset and rapid recovery is recorded as <Acute>. Primary infections fall under this category.

<Chronic>
A long-term infection or illness and partial remission is recorded as <Chronic>.

<Post>
An illness a subject has recovered from is recorded as <Post>. Latent (potentially existing but not presently evident or realized) and remission (a period during which symptoms of disease disappear [complete remission]) are recorded as <Post>. Note that partial remission will be recorded as <Chronic>.

<Other>
Any disease stage that cannot be classified under <Acute>, <Chronic> or <Post> will be classified under <Other>. Cancer stages and household contacts will be recorded as <Other>.

<Unknown>
When the disease stage is not clearly specified or unavailable in the reference, it will be recorded as <Unknown>. The main distinction between <Other> and <Unknown> is that <Other> specifies a disease stage mentioned in the reference, but not classifiable as <Acute>, <Chronic>, or <Post> while <Unknown> specifies a disease stage which is not explicitly mentioned.

Recording Multiple Immunizations

When there are multiple immunizations reported (possibly involving a number of different immunogens), only the first immunization and immunogen used is captured in the [Immunization] fields with the subsequent Immunizations being recorded in the Comments on Immunization field.

Important Note: When several immunization procedures, schedules, and doses are used, these may be bulked into one assay context. Curate the procedure giving the best results or the one most often used and comment on the other procedures.

Number of immunizations
This field describes the entire immunization schedule including dosing and formulations.

In vivo Prime / Boost
When multiple immunizations with a number of different immunogens is used, the first immunogen is recorded in the Immunogen field with information regarding subsequent immunizations recorded in Comments on Immunization.
It is very important to enter this information in the Comments on Immunization field because the future database design may be capable of capturing this information separately.

In vitro restimulation
Effector cells may be taken from a host and restimulated in vitro prior to the assay. This restimulation may be performed with the epitope itself or an antigen containing the epitope in order to amplify an existing response to detectable levels in an assay.
When cells from an immunized (exposed) subject are washed after restimulation and before antigen is added to measure a T cell or B cell/antibody response, this is considered an in vitro restimulation and the in vitro restimulation information will be recorded in the In Vitro Immunization/Restimulation field. The in vivo immunization fields will reflect the in vivo immunization or exposure process. When cells are taken from an immunized subject and directly assayed in the presence of antigen, this is not considered restimulation.
When cells are derived from a naïve subject and first encounter the antigen in vitro, this is considered the immunization process and is not considered an in vitro restimulation.

Important Note: Growing dendritic cells in vitro, prior