Semantic resources project/MouseModels

= Mouse Models of Disease =

The "mouse models" resource collects information about inbred mouse strains used for biomedical research, along with metadata about the phenotype, disease models, genetic alleles, suppliers, and publication history of each strain. We have begun to identify available web sources of mouse model information. The mouse model ontology that we develop will contain connections to the Sequence Ontology for genetic mutant and allele information, the Mammalian Phenotype ontology, and one or more gross anatomy and disease ontologies.

Mouse models are important not only for allowing researchers to easily find research specimens for diseases and conditions, but also as an important component of the metadata associated with publicly available datasets. Genomic and genetic experiments performed in mice can only be understood in the context of the precise genetic background and mouse strain in which the experiments was performed.

Creation of this resource will require some combination of the following: Specific manual curation is out of scope, although it may be necessary to give certain central entities (e.g. ancestral strains) manual attention.
 * 1) generic ontology work, so that we have a theory of the information at hand that harmonizes with that of other resources
 * 2) script writing: convert MGI and Jax data ables to RDF
 * 3) script to harvest Jax and pubmed references from Alzforum mouse model list
 * 4) scraping various text sources for a variety of information
 * 5) parsing strain nomenclature to extract relationships

= Use Cases & Desiderata =

Discussion on obi-developer
The idea of a mouse strain ontology was discussed on obi-developer in January 2010.

Alzforum mouse models (in HTML, not db-backed) - might be nice to express this information in RDF for easier query and reuse.

OBO Mouse Strain google group is a discussion group with representatives of a number of groups who need a mouse strain resource, as well as representatives of MGI.

Selected threads:
 * Use cases (subject line is "requirements")
 * PD Online mouse model requirements
 * Some definitions
 * How to find a strain at MGI

Following is TWD's summary of information involved in the use cases, abstracted from the google group mailing list discussions.


 * 1) Information usually parseable from the strain identifier
 * 2) Congenic/incipient (# of backcrosses)
 * 3) "strain origin" -- derivation, including lineage
 * 4) attachment to taxon (in these use cases, always Mus musculus)
 * 5) terms for groups of strains, such as "BL/6" which corresponds to sets of strains (usually related to lineage)
 * 6) Genetic content of a strain
 * 7) "genetic background" (related to genotype of ancestor strain)
 * 8) loci+alleles at loci (+origin of those alleles: e.g. "human SNCA") (complexity around heterozygosity and polyploidy)
 * 9) Phenotype of a strain
 * 10) (human) disease modeled, cf. OMIM HPO
 * 11) interesting non-disease phenotype, e.g. color
 * 12) for each phenotype noted, any known association to allele (this may be generic to the allele, not specific to the strain)
 * 13) toxin response (not sure what this means exactly) (PD Online)
 * 14) "missing phenotypes" -- "what features of human PD are missing from the model?" (PD Online)
 * 15) "expression patterns" (M. Rogan) (probably not realistic for this project)
 * 16) Guidelines related to strain
 * 17) "feeding guidelines" (maybe different from one supplier to next)

Deep dive into one or two examples: /Bottom-up

Ontological considerations
It appears there is a real need (in the broader community) not only for data conversion and curation but also for ontology ("modeling").

We speak of "strains" but the term is poorly defined. On the other hand maybe it doesn't need to be defined. Ontologically the class "mouse" and the relation "parent" seem pretty well defined. We should build on these and speak not of 'strains' but of classes of mice. In classic DL tradition a class is best defined not by fiat but according to the properties shared by, and exclusive to, its members, e.g. the process(es) that produced them, descent, genotype, phenotype, supplier, etc.

(Of course even in BFO there is room for particulars that consist of a number of mice. The null hypothesis should be that we don't need these, but we could introduce them if logically forced to.)

= Data Sources =

/Questions about the content and meaning of the MGI and Jax data sources


 * Mouse Genome Informatics (MGI) (at Jax)
 * Sample transgene detail
 * Genealogy chart of inbred strains (PDF)
 * Inbred Mouse/Rat Search Form: (No idea if this is out of date or not.)
 * MGI FTP site Includes many reports (tab-delimited) drawn from their database, including a strain report. (If we ask nicely maybe we can get the whole database?) "Any reproduction or use for commercial purpose is prohibited without the prior express written permission of the Jackson Laboratory."
 * International Mouse Strain Resource (IMSR) (at Jax): Indexes mouse strains available at JAX, Emma, and elsewhere -- searchable, tabular output, links back to the original data sheets. data here
 * Mouse Phenome Database (MPD) (at Jax): Reports phenotype data for ~40 of the common inbred mouse strains. data here
 * Particular repositories
 * Jax Mice portal.
 * Browse Jax mice - the information in the result sets is available in the Jax IMSR report, except for stock number and price. (Judy has provided stock number / strain table)
 * C57BL/6J : an example JAX mouse datasheet
 * European Mouse Mutant Archive (EMMA)
 * EMMA Strain Search/Browse
 * MMRRC: Mutant Mouse Regional Resource Centers
 * Taconic : appears to have an "animal models" section, including mouse models.
 * Tau mouse strain : comes up when searching for models of Alzheimer's.
 * Alzforum mouse models (data in HTML, not db-backed)
 * IEDB : the Immune Epitope Database also contains information about mouse models and strains.

= Data Model =

Jonathan's notes:
 * A/Bc and A/Bc/De are now subclasses of A
 * A/Bc not a substrain of A
 * Now using URIs for alleles in strain file, with blank node notation when none exists
 * Now using URIs for markers ... maybe a bad idea ... and the marker file isn't dealt with yet

BIB_PubMed.ttl
@prefix mgi: 

There are three properties in this file:
 * 1) mgi:hasMgiId
 * 2) mgi:hasReferencePmid
 * 3) mgi:hasReferenceAltId

parsed-strains.ttl
= OBI Model =

= Expressing it in RDF =

See /Expressing it in RDF for notes on how to render selected MGI-derived information into RDF.

= External Resources & Notes =


 * Strain Name
 * DB Ref #'s
 * JAX
 * EMMA
 * Others
 * Supplier
 * Supplier List (and older list, taken from the Silver book below)
 * Availability?
 * Common Names (Synonyms)
 * Notes/Descriptions
 * Type (Inbred, Congenic, etc.)
 * Mating System
 * Species/Taxonomic reference
 * H2 Haplotype?
 * Genotype/Alleles
 * gene name
 * allele name
 * strain_of_origin
 * note(s)
 * Appearance
 * Disease Models
 * Derived_from Strains
 * has_phenotype
 * Publication references

JAX Lab Information
Each page on the JAX lab website for a particular mouse strain has eight sections:
 * 1) Description
 * 2) Phenotype
 * 3) Genes & Alleles
 * 4) Genotyping
 * 5) References
 * 6) Health & husbandry
 * 7) Purchasing information
 * 8) Terms of Use

Description
The Description section contains free-form text fields, as well as references to other "related" strains (some of which are indexed according to the allele which they share with this strain).

The Description section contains the following entries (not present in all strains):
 * Strain Name
 * Stock Number
 * Availability
 * Levels 1-??
 * Cryopreserved -- "ready for recovery"
 * Former Names
 * Type
 * Inbred
 * Mutant Strain
 * Spontaneous Mutation
 * Conisogenic
 * Species
 * Laboratory Mouse
 * H2 Haplotype
 * Appearance (free text)
 * Description (free text)
 * Development (free text)
 * Related Strains
 * Listed for each allele, and for variants of the alleles.

Phenotype
The Phenotype section describes, using semi-free text (these may be controlled terms, but they're arranged in an outline format) different diseases, conditions, or observed phenotypes which are relevant to or exhibited by this strain.

Genes & Alleles
The Genes & Alleles section contains a number of sections, one for each of the particular alleles which is consistently carried by members of this strain, and the features of that allele.


 * Allele Symbol
 * Links back to the MGI database; cf. Dock7m
 * Allele Name
 * Allele Type
 * Common Name(s)
 * Strain of Origin
 * Gene Symbol and Name
 * Chromosome
 * Gene Common Name(s)
 * Strain of origin
 * Molecular Note

Health & Husbandry
Health and husbandry contains reports on how these strains are bred, fed, and maintained.

Other Sections
I'm not sure that Purchasing Information and Terms of Use vary from strain to strain.

OBO Ontologies

 * Cell Type (CL)
 * Common Anatomy Reference Ontology (CARO)
 * Foundational Model of Anatomy (FMA)
 * Mammalian Phenotype (MP)
 * Mouse Adult Gross Anatomy (MA)
 * Mouse Gross Anatomy and Development (EMAP)
 * Mouse Pathology (MPATH)
 * Phenotypic Quality (PATO)
 * Sequence Types and Features (SO)
 * Human Disease (DOID)

Other Notes
Some of the "disease model" links for certain alleles in the MGI database links back to the OMIM database from NCBI: OMIM : Online Mendelian Inheritance in Man For example, the Dock7 allele, above, links to the "Storage Pool Platelet Disease" page in OMIM.

Several NIF ontologies are probably also relevant:

DPO: Disease Phenotype Ontology NeuroLex Disease Hierarchy

Brainstorming on Terms
Silver's "Mouse Genetics"

Strain Nomenclature
Strain Nomenclature Guide

MGI Mouse Nomenclature

Our /Nomenclature parsing project

Examples
C3H/HeJ-ruf (F?+25) Explanation: strain/substrain (generation indicator) Strain is "C3H", and substrain is "HeJ" "F?+25" indicates an unknown number of (inbreeding) generations, but known to be more than 25.

Exotic examples
Here are ids containing rare characters '?' and ':' MGI:2161806	B10.C-H3 H13<?> A/(28NX)SnJ	congenic MGI:2163767	B10.D2-H11<?>/(55N)SnJ	congenic MGI:3710502	B6Smn(C3)-Fasl Kitl/GrsrJ	congenic MGI:4451211	CnbcLmon:NMRI-Tg(tetO-Tyr)1335Lmon Tg(Tyr-rtTA)4111Lmon	Not Specified MGI:4412282	J:DO	Not Specified MGI:4418875	J:NU	Not Specified MGI:3043634	MP1.BP1-Im2/DnmOrl	congenic MGI:3834621	NTac:NIHS	Not Specified MGI:4414936	ORNL:STOCK A/a Oca2&lt;p>	Not Specified

Literature, References, & Examples
Blasius et al. "Mice with mutations of Dock7 have generalized hypopigmentation and white-spotting but show normal neurological function", PNAS (2008)

RGD slide deck

= Term Definitions =

Inbred Mouse Strain
There's an established definition for an "inbred mouse strain" --

"'A strain shall be regarded as inbred when it has been mated brother x sister (hereafter called b x s) for twenty or more consecutive generations (F20), and can be traced to a single ancestral breeding pair in the 20th or a subsequent generation. Parent x offspring matings may be substituted for b x s matings provided that, in the case of consecutive parent x offspring matings, the mating in each case is to the younger of the two parents. Exceptionally, other breeding systems may be used, provided that the inbreeding coefficient achieved is at least equal to that at F20 (0.99).'"

("Beck JA, Lloyd S, Hafezparast M, Lennon-Pierce M, Eppig JT, Festing MFW, Fisher EMC. 2000. Genealogies of mouse inbred strains. Nature Genetics 24: 23-25. MGI J:59229")

Outbred Mouse Strain
From an email by Terry Meehan to the OBI Dev mailing list, explaining how MGI viewed outbred/inbred strains and genotype assumptions:

1)The CD1 strain involved in the background of this mouse is an outbred strain, meaning the genomic sequence will significantly differ between  individuals as compared to an inbred strain where siblings have (in theory) identical genomes.

Backcross
Backcrossing is a crossing of a hybrid with one of its parents or an individual genetically similar to its parent, in order to achieve offspring with a genetic identity which is closer to that of the parent. It is used in horticulture, animal breeding and in production of gene knockout organisms. (Wikipedia (9/25/2009))

Congenic
Organisms that differ in genotype at (ideally) one specified locus. Strictly speaking these are conisogenics. Thus one homozygous strain can be spoken of as being congenic to another. (Biology Online)

Conisogenic
An alternate term for congenic, apparently.

(Not clear why these are typically listed separately?)

Coisogenic
TBD

Intercross
Mating of heterozygotes (a/+ crossed to a/+). {Northwestern University Biochemistry glossary)

Haplotype
Contraction of "haploid genotype." A collection of genetic markers, or DNA sequences, which are inherited as a group; a particular combination of such markers is "a haplotype." (Wikipedia 9/25/2009)