Semantic resources project/Antibodies

= Antibodies Resource =

Our first resource is the collection and curation of information on commercially and privately-available antibodies. Antibodies are an important reagent in many biomedical experiments; however, they are provided by a large number of suppliers and are accompanied by a complex array of metadata and quality information.

We are working to adapt a high quality hand-curated set of 20,000+ antibodies, assembled by the AlzForum research forum website, as a semantic resource for biological research. Our curation and modeling are general enough to handle antibody data from other sources (such as commercial suppliers). We are working with the antibody supply community to obtain additional donations of commercial antibody data to this resource.

The technical details of modeling antibodies are centered around the representation of both the antibody's specificity, as well as the identification of citations detailing the use of the antibody (classified by the experimental method for each usage). We are building a relationship with a public ontology effort for proteins, PRO, that will help us represent antibody specificities more precisely than any other large antibody dataset. Furthermore, we are leveraging the existing publication and citation resources from SWAN and Neurocommons to tie the antibody resource to existing datasets through an accurate accounting of research methods.

The antibody resource is aimed at researchers who are trying to discover antibodies specific to a pathway or biological component of neurodegenerative disease, while avoiding the common problems of naming and synonym involved in large protein datasets. At the same time, we are also aiming at allowing researchers to discover publications and research methods associated with particular antibodies, as a means to evaluate the antibody's use in their own research.

= Data Collection =

Our goal is to format antibodies in an OBI-compliant manner; to do this, we need information about resources that are available either commercially or through private labs.

AlzForum Antibody Dataset
We have collected raw data on 20,000+ antibodies from two different contributed sources. These datasets include both the raw data behind the AlzForum Antibodies databank, which contains a hand-curated set of antibody entries relevant to Alzheimer's and neurodegenerative disease research, as well as a private dataset contributed by a commercial antibody supplier.

We have outlined an ontology for the representation of this data. As part of building this ontology, we have identified several key features of the ultimate resource: for example, we have identified the possibility of multiple suppliers selling the same antibody under different catalog numbers, which necessitates separating the "offer" to sell the antibody from the antibody itself. This modeling will make it possible of us to ultimately include data from many suppliers in this resource, and to accurately resolve antibody references across publications and between different research laboratories.

We have used a text-mining and searching software infrastructure to identify protein names in the free-text data fields of the identified antibody datasets. Identifying antibody specificity (the protein or proteins to which an antibody is specific, the key experimental feature of an immunochemical experiment) is a central aspect of the modeling and representation of antibodies themselves. Both of the antibody datasets we have identified so far include specificity descriptions as textual fields without a consistent format and using protein identifiers (common names) that are sometimes ambiguous. We have used open source software components and standardized protein tagging methodology to identify relevant protein terms from these textual fields, and to annotate the antibodies as specific for normalized versions of those protein names.

We have interacted with PRO in building a social process for generating terms, in bulk, for the targets of these antibodies. As we identify the proteins and peptides to which the antibodies in our initial datasets show specificity, our goal has been to annotate those antibodies as specific to protein terms from the PRO ontology described above. However, the PRO ontology is currently in development and partially incomplete; some of the proteins for which we have specific antibodies are not present in PRO. In order to move forward with our modeling, we have worked with PRO to identify the key proteins which that ontology lacks, and to request new terms for those proteins.

We have worked to identify a set of research methods for indexing the use of antibodies in experimental protocols. Understanding antibodies in relation to existing biomedical experiments requires understanding how those antibodies have already been used in published experiments. We have worked to identify a complete set of immunochemical experimental methods under which the majority of antibody uses recorded in our datasets will fall. These methods can then be modeled themselves, and submitted to existing biomedical ontologies (OBI).


 * Test data set
 * Antibody Use Methods
 * Proteins and Fragments Related to Alzheimers
 * Preliminary work
 * Alzforum to PMID
 * Competency Questions

NIF Antibody Dataset

 * /NIF Antigen Mapping

= Tools & Workflow =


 * Workflow Notes

Protein Name Mining
The first step in representing antibody data is how to find and represent the specificity of the antibody. This often means finding precise protein names (or the names of genes corresponding to proteins) within larger free-text fields.

Mining Protein Names

Manual Annotation
The Antibody Record Annotator provides a method for converting relational antibody data sources with free-text fields into a structured representation suitable for conversion into a Neurocommons Bundle.

Hand annotation is the first step -- free text fields are mined, either automatically or through manual annotation, for substrings which represent structured knowledge surrounding the antibody.

Automatic Rule-based Annotation
This knowledge is then transferred to the larger set of antibodies through automatic rule matching.

An Annotation Cache is used to collect the associations of text strings with structured annotations from the Manual Annotation stage. These annotations are then used as text rules for the unannotated corpus of antibody records.

= Data Model =

Creation




Offer to Sell


= OBI Model =

There are three core elements to represent an antibody:
 * 1) the antibody itself
 * 2) an antigen for an antibody
 * 3) the solution which is offered by an organization
 * 4) the offering organization
 * 5) the process by which the antibody was created

We need URLs for properties which connect these classes.

We need OBI descriptions of each class, to describe their meaning and relationship with each other.


 * Antibody OBI Model
 * Methods in OBI

= Neurocommons Bundle =

/Bundle Code

= Notes & External Content =

Prior Work

 * Previous work on Alzforum antibodies conversion
 * Data standards for minimum information collection for antibody therapy experiments
 * Minimum Information about a Protein Affinity Reagent (MIAPAR)
 * Report: A community standard format for the representation of protein affinity reagents. (about MIAPAR)
 * Yong et al. Data standards for minimum information collection for antibody therapy experiments. Protein Engineering Design and Selection (2009) vol. 22 (3) pp. 221-4
 * Browsable version of their schema

AlzForum Content

 * Review of Alzforum desperately seeking antibodies
 * Discussions with Don Hatfield

OBI Content

 * Beginning discussion with OBI
 * MIREOT: Minimal Information to Reference External Ontology Terms

Other Content

 * Antibodypedia
 * ProteomeBinders : "a European consortium proposing to establish a comprehensive infrastructure resource of binding molecules for detection of the human proteome, together with tools for their use and applications in studying proteome function and organisation."
 * IMGT : ImMunoGeneTics
 * IEDB Source Pages (ex. Mouse)
 * CHDI HTT Antibody List (it's in PDF)