ImmPort

HLA project report, November 2009: HLA Structure Variation

Science Commons is working on contract to ImmPort (Immunology Portal) to explore the use of Semantic Web technologies for that repository. The first use case concerns investigating factors in the disease systemic sclerosis.

We'll use this page as a starting point for work done on this project or that spins off from it.

HLA-A is Entrez Gene 3105, Uniprot P04439. HLA-B is Entrez Gene 3106, Uniprot P01889.

/Demo_sketch, /Demo_queries

MaHCO - an MHC ontology (StemNet)

 * Ontology (with two .owl files) providing URIs for HLA alleles and MHC-related classes
 * Derived in part from IMGT/HLA database, but has additional intelligence
 * Outcome: Bundles/mahco

IMGT (at EBI)

 * IMGT = LIGM (immunoglobulins and T cell receptors) + HLA (alleles of MHC proteins)
 * Seems to consist mainly of sequence and alignment data. Some pubmed references too.
 * Linking opportunities: Medline; EBI sequence record.
 * download page.
 * On ashby: /work/imgt/hla/source/*.{fasta,msf,pir}
 * Outcome: Bundles/hla - a bundle that captures allele/medline associations from hla.dat.

HLA Sequence Feature Variant database

 * (DAIT Data Interoperability committee driven)
 * See https://www.immport.org/immportWeb/display.do?content=ImmPortPub

PDB - protein structure data (3D coordinates)

 * Example - HLA-A*0201/OCTAMERIC TAX PEPTIDE COMPLEX (see Uniprot P01892)
 * Linking opportunities: Proteins via Uniprot; Pubmed; organisms.
 * The XML schema (hairy)
 * XML ftp directory
 * This is big (77G?), with a single file for each structure. There doesn't seem to be any shortcut to getting just the metadata.
 * Outcome: ImmPort/PDB, Bundles/pdb

dbSNP

 * "large but uniform."
 * Mostly sequence data.
 * Linking opportunities: Publication (pubmed), author, organism, possible OBI annotation ("method").
 * Hairy ER diagram.
 * ftp site readme
 * Tables available as ASN, and also as tab-delimited

IEDB (Immune Epitope Database)

 * Bjoern Peters was interested in this - recruit him to do RDF
 * Linking opportunities: Cell type, chemical species, Siwssprot, Genbank, PDB, maybe OBI (assay type), patents(?)
 * Related (prior) databases: MHCPep, SYFPEITHI, HIV Sequence Database, JenPep, HMCBN, Corixa, Pangea, Epimmune
 * On ashby: /work/iedb/source/*.xml

MHCBN
The MHCBN is a curated database consisting of detailed information about Major Histocompatibility Complex (MHC) Binding, Non-binding peptides and T-cell epitopes. The version 4.0 of database provides information about peptides interacting with TAP and MHC linked autoimmune diseases. This database is Developed by Dr Raghava's Group, at Bioinformatics Centre, Institute of Microbial Technology, Chandigarh, INDIA.

''How does this relate to IEDB? Is it an input in the construction of IEDB or what?''


 * EBI SRS version
 * number of entries
 * download database as text
 * on ashby in /work/mhcbn
 * Recent paper: MHCBN 4.0: A database of MHC/TAP binding peptides and T-cell epitopes

Innate DB (pathways)

 * Similar in form to HPRD
 * Incorporates IntAct, DIP, MINT, BIND
 * Linking opportunities: Pubmed, OBI/method (a la PSI-MI), cell type. Interaction participants have full cross-reference info: Ensembl, Unigene, HUGO, OMIM, Entrez Gene, etc.
 * ''On ashby: /work/innatedb/www.innatedb.ca/download/interactions/*

Medline

 * By simple text matching we can find occurrences of allele names in Medline abstracts. This is useful because many things link to Medline, and Medline is organized by subject headings (MeSH), providing a way to collect sets of papers relating to a particular subject as a starting point for queries.
 * Outcome: Bundles/medline/alleles

Open Biomedical Ontologies (OBO)

 * Bundles/obo/all
 * Includes Gene Ontology and human phenotype (HPO)

MeSH

 * The MeSH polyhierarchy: Bundles/mesh/mesh-skos
 * Article-to-MeSH annotations: Bundles/medline/subject-headings

Entrez Gene

 * Bundles/ncbi/gene-info, Bundles/ncbi/gene-pubmed

OMIM

 * JAR has converted OMIM allele mentions to RDF. There are only a few hundred (restricting to the ones using the * nomenclature).  Not clear what to do with it.
 * Outcome: Bundles/omim

Other ontologies and thesauruses to be considered

 * Amino acid ontology
 * Human Phenotype Ontology (in OBO, but there are also gene and OMIM links, separately)
 * StemNet ontologies (these include MaHCO, above)
 * UMLS ??

Other databases to be considered

 * Epitope Prediction and Analysis Tools
 * dbMHC - check out the tree view
 * CTD (comparative toxicogenomics)
 * HapMAP
 * HLA Informatics Group
 * Human MHC haplogroups in Wikipedia
 * The MHC Haplotype Project - The MHC Haplotype Project offers a framework and resource for association studies of all MHC-linked-diseases. It will provide the complete genomic sequences of at least 8 different HLA-homozygous typing haplotypes (listed below), their resulting variations (SNPs and DIPs) and ancestral relationships
 * SNPedia
 * TANTIGEN Tumor T cell antigen database is a data source and analysis platform for cancer vaccine target discovery focusing on human tumor antigens that contain HLA ligands and T cell epitopes. It contains 2005 antigen entries from 251 protein antigens. The database also provides information on T cell epitopes and HLA ligands with full references, gene expression profiles, antigen isoforms, and mutations. Predicted binding peptides of 15 HLA Class I and Class II alleles were also included in the database.

Database lists

 * Molecular immunology databases and data repositories
 * Immunological Databases and Tools
 * Databases on the web - Immunology
 * NAR Immunological databases
 * Immunological databases: MetaDB

Scientific background

 * Wikipedia: Human leukocyte antigen
 * HLA nomenclature system
 * Google: hla peptide docking database
 * Google: hla peptide binding database
 * Quantitative Predictions of Peptide Binding to Any HLA-DR Molecule of Known Sequence: NetMHCIIpan CD4 positive T helper cells provide essential help for stimulation of both cellular and humoral immune reactions. T helper cells recognize peptides presented by molecules of the major histocompatibility complex (MHC) class II system. HLA-DR is a prominent example of a human MHC class II locus. The HLA molecules are extremely polymorphic, and more than 500 different HLA-DR protein sequences are known today. Each HLA-DR molecule potentially binds a unique set of antigenic peptides, and experimental characterization of the binding specificity for each molecule would be an immense and highly costly task. Only a very limited set of MHC molecules has been characterized experimentally. We have demonstrated earlier that it is possible to derive accurate predictions for MHC class I proteins by interpolating information from neighboring molecules. It is not straightforward to take a similar approach to derive pan-specific HLA-DR class II predictions because the HLA class II molecules can bind peptides of very different lengths. Here, we nonetheless show that this is indeed possible. We develop an HLA-DR pan-specific method that allows for prediction of binding to any HLA-DR molecule of known sequence—even in the absence of specific data for the particular molecule in question.
 * Analysis and predictions of the events involved in antigen processing and presentation We are analyzing the peptide binding specificity of the HLA class I molecules using several complementing approaches. HLA class I molecules are purified from natural sources, bound peptides are eluted off, and individual peptides as well as motifs are identified by peptide sequencing. In parallel, we are using synthetic peptide and peptide libraries in conjunction with biochemical assays like the above ELISA to generate quantitative data on peptide-HLA binding. These efforts will lead to a detailed, unbiased description of the peptide binding motifs of the most common HLA molecules.
 * Genetic epidemiology / Systemic sclerosis Short and sweet. Worth reading for overview of what is known about heritable factors and disease related genes. (2002)
 * HLA B*5701 is highly associated with restriction of virus replication in a subgroup of HIV-infected long term nonprogressors
 * Predisposition to abacavir hypersensitivity conferred by HLA-B*5701 and a haplotypic Hsp70-Hom variant This and above establish a non-obvious connection. How do our analyses treat this?
 * Dominant influence of HLA-B in mediating the potential co-evolution of HIV and HLA
 * Crystal Structure of HLA-B*5701, presenting the self peptide, LSSPVTKSF
 * The Differential Ability of HLA B*5701+ Long-Term Nonprogressors and Progressors To Restrict Human Immunodeficiency Virus Replication Is Not Caused by Loss of Recognition of Autologous Viral gag Sequences
 * Cytokines & Cells Online Pathfinder Encyclopaedia
 * A DNA microarray survey of gene expression in normal human tissues Follow forward citations to see where this sort of data is accumulated.
 * Systemic and cell type-specific gene expression patterns in scleroderma skin - paper looks at patient versus normal gene expression in forearm and back, affected and unaffected. Conclusions are that unaffected or not the gene expression is very much the same. Large amounts of B-cell expression. Refers to concentration of B-cells in rejected kidney transplants. Microarray data available! http://genome-www.stanford.edu/scleroderma
 * Cytochrome P2 polymorphisms and susceptibility to scleroderma following exposure to organic solvents
 * Someone's collection of articles about SSc
 * HuGE Navigator HuGE Navigator provides access to a continuously updated knowledge base in human genome epidemiology, including information on population prevalence of genetic variants, gene-disease associations, gene-gene and gene- environment interactions, and evaluation of genetic tests. The Human Genome Epidemiology Network (HuGENet™) is a voluntary, international collaboration focused on assessing the role of human genome variation in health and disease at the population level. Since 2001, HuGENet™ has maintained a database of published, population-based epidemiologic studies of human genes extracted and curated from PubMed. The HuGE Navigator replaces earlier search tools for use with this database and provides additional applications for use by researchers and the public.
 * Disease Features, Disease Type, HLA Types, and Autoantibody Profile in 17 Multicase SSc Families
 * HLA Markers for Susceptibility and Expression in Scleroderma
 * HLA associated genetic predisposition to autoimmune diseases: Genes involved and possible mechanisms
 * Causes of Scleroderma: DNA, Genetics, Race
 * Risk factors for and possible causes of systemic sclerosis (scleroderma)
 * Scleroderma: from cell and molecular mechanisms to disease models
 * John Varga
 * first EULAR/EUSTAR Sleroderma Course
 * Basic Science, Fibrosis
 * SSc and Autoimmunity
 * Report on Workshop 2007
 * Crystallographic Structure of a Rheumatoid Arthritis MHC Susceptibility Allele, HLA-DR1 (DRB1*0101), Complexed with the Immunodominant Determinant of Human Type II Collagen
 * The role of HLA-DQ8 ß57 polymorphism in the anti-gluten T-cell response in coeliac disease
 * Immunoinformatics Comes of Age
 * Relative predispositional effects of HLA class II DRB1-DQB1 haplotypes and genotypes on type 1 diabetes: a meta-analysis

Queries

 * HLA Mesh terms (note that there are >700 results so you should increase display rows option to "no limit")
 * Papers with an association to one of the 700 mesh terms - 90K total articles
 * How many articles on the immune system

Ports
[[Media:Occurrences.tgz]] = occurrences of allele names in Medline abstracts, represented as a gzipped tar file containing 700+ RDF/XML files.