Semantic resources project/MouseModels/Expressing it in RDF

= How to express the MGI resource in RDF =

To generate useable RDF we need templates for expressing what has been learned by reading the various files from the MGI FTP site.

Look at Terry's strain ontology.

Add processes as needed to OBI.

Strains
For each strain that has an MGI id, we need a URI. (we probably also need to think about what the broker's temporary URIs ought to be in this case.)

Possible approach: request an obolibrary namespace.

Alleles
For each "allele" that has an MGI id, we need a URI.

Possible approach: request an obolibrary namespace.

Laboratories
For each laboratory known to ILAR and occurring in an MGI strain or allele name, we need a URI.

Possible approach: request terms in the ORG namespace. http://obolibrary.org/obo/ORG_12345

Sites (= strain repositories = suppliers)
One URI for each of the IMSR site (small integers, about 20 of them).

Possible approach: request terms in the ORG namespace. http://obolibrary.org/obo/ORG_12345

Currently in the ORG namespace, each organization has:
 * rdfs:label e.g. "Laboratory of Hans Frei at Tuebingen University"
 * definition source
 * foaf:homepage

Offers to sell
One URI for each Jax stock number, or equivalent for the other suppliers. Do these need URIs at all? Why would they be the subject or object of an RDF statement? Absent a reason, JAR prefers blank nodes. Alan idea: COMMERCE_ilar+stock

Types and relationships
We need to know how to express this information in RDF, either as a single class or relation, or as a more complex pattern (restrictions and so on).

Strain to MGI id
Relation between a strain and the MGI id that names it: s:MGI_2165578 :hasMgiId "MGI:2165578" -- don't assert this, as the MGI id is in the URI itself. (if it has to be asserted, maybe use sciencecommons id relation?)

Strain to strain name
Relation between a strain and its official strain name: use rdfs:label ? s:MGI_2165578 rdfs:label "B10.PL/(73NS)Sn-Dst/J" Note: this may be the only place where the strain's "serial number" is recorded [current :hasSerialNumber].

Alan: If parts of name are important, consider turning a strain *name* info an information artifact that denotes the strain.

Strain types
Types of strains: need to express this information somehow as it is important and curated by MGI: s:MGI_2165578 :hasStrainType "congenic".

Most of the strain types have to do with process by which the strain was created, e.g. every member is the specified output of a xxx process (which might include subsequent breeding).

Here are all the strain types:
 * Not Specified
 * coisogenic
 * congenic
 * conplastic
 * consomic
 * inbred strain
 * recombinant congenic (RC)
 * recombinant inbred (RI)

These are all documented in the MGI strain nomenclature guide.

Strain in custody of laboratory
We need a way to say that a strain is defined (in part) as being one that is in the custody of a certain laboratory, e.g. "/J" in "B10.PL/(73NS)Sn-Dst/J" means the strain is maintained at Jax. [Current: :hasLabCode]

Strain derived from, or with the help of, strain
For coisogenic strains, we need one strain-strain relationship "S1 is the background strain from which S2 was derived". (An input to some process.) [Current :hasBackground]

For congenic strains, we need three relationships: "has host", "has donor", "has helper".

For several different strain types (at least coisogenic or congenic), we need the relationship between the strain and the allele that is "has" (part of of its genetic makeup that is background or host doesn't have). [Current :hasAllele]

Many strain names contain the special symbol "Cg". I've recorded this in the parse tree, but I don't see any reason to retain it as it simply says that some information (identity of donor or helper strain) was not recorded.

Alleles
Example from phenotypic allele report:

a:MGI_2384111 :hasMgiId "MGI:2384111" ; :hasAlleleSymbol "a<17R>" ;                 -- rdfs:label :hasAlleleName "nonagouti 17 Oak Ridge" ;   -- IAO alternative term :hasAlleleType "Radiation induced" ;        -- see below :hasAlleleArticle  ; :hasAlleleMarker m:MGI_87853 ;              -- this leads to Entrez Gene, genome, etc.  :hasAlleleEnsembl "ENSMUSG00000027596" ;     -- don't record :hasAllelePhenotype mp:MP_0001186 ; :hasAllelePhenotype mp:MP_0005393 ; :hasAlleleSynonym "58DSD" ;       -- don't record ? :hasAlleleSynonym "a<58DSD>". -- don't record ?

Not alleles?
Need to decide what to do with alleles that aren't really alleles, e.g. [:hasAlleleSymbol "(D1Mit65-D1Mit334)"]. Chuck them pending some way to parse and talk about them?

Allele to phenotype
We need to express the relationship between an allele and the phenotype (from MP) it seems to induce.

Allele types
Would be good to have a way to express the "type" of an allele as recorded in the allele report.

Here are all the allele types: -- Alan: most are planned processes
 * Chemically and radiation induced
 * Chemically induced (ENU)
 * Chemically induced (other)
 * Gene trapped
 * Not Applicable
 * Not Specified
 * QTL                 -- quantitative trait locus.  investigate.
 * Radiation induced
 * Spontaneous         -- unplanned process.
 * Targeted (Floxed/Frt)
 * Targeted (Reporter)
 * Targeted (knock-in)
 * Targeted (knock-out)
 * Targeted (other)
 * Transgenic (Cre/Flp)
 * Transgenic (Reporter)
 * Transgenic (Transposase)
 * Transgenic (random, gene disruption)
 * Transgenic (random, expressed)
 * Transposon induced

IMSR - strains for sale
Each catalog entry constitutes an offer to sell. Do offers need URIs?

[]   :hasStrainOrStock "C57BL/6J-hstp/J"; :hasSite "JAX"; :hasState "live mouse"; :hasType "mutant strain"; :hasChr "19"; :hasMutation "spontaneous mutation";    -- seems to be derived from allele type :hasAlleleSymbol "hstp"; :hasAlleleName "high stepper"; :hasGeneName "high stepper".

Offer being made by
Each IMSR supplier has its own catalog file, and its own site id (small integer, e.g. Jax = 1). We need to be able to express that an offer is being made by some supplier (organization).

Strain types
Here are strain type [:hasType] values. These appear to give information that's similar to, but sometimes more detailed than, what we find in the MGI strain list.


 * coisogenic strain
 * congenic strain
 * consomic or chromosome substitution strain
 * hCYP1A1_1A2_Cyp1a1/1a2(-/-)_Ahr mutant mice
 * inbred strain
 * major histocompatibility congenic
 * minor histocompatibility congenic
 * mutant stock
 * mutant strain
 * recombinant congenic
 * recombinant inbred
 * segregating inbred
 * unclassified
 * wild-derived

State in which provided
Might be nice to capture this info from the IMSR reports.

Values taken on by :hasState property:


 * ES cell lines
 * cryopreserved embryos
 * live mouse