Semantic resources project/MouseModels/Questions

Here are some questions we might ask friendly people at Jax regarding mouse strain nomenclature and the data found at the MGI and IMSR sites.

How to approach
Here are our goals, generally speaking
 * A coherent, controlled, comprehensive namespace of strain names
 * Similarly for genetic modifications (mutations, transgenetic manipulation, etc.)
 * Ids for relation(s) between strains and modifications (strain has modification)
 * Ids for relations between strains (in particular: substrain, host, donor)
 * Tables of these relations to the extent they're easy to extract (e.g. from parsing strain names, or MGI reports)

To ask Michael Sasner
- Michael says that we should consider using the IMSR id as the main id for strains, as there will be IMSR strains that don't have MGI ids, since there is effort needed to curate them - they need an official strain name and allele information curated. (IMSR ids do not occur in any of the publicly available reports, so we'd need to get the associations from Janan.)
 * There are many strains for sale at Jax that don't have MGI accession numbers (in the strain file) [about 36% of them]. Should we expect that the MGI strain list be complete? [Michael expects that most of the Jax strains should have MGI numbers.] Is it reasonable to request that accession numbers be allocated for all IMSR strains? [Michael says no, as IMSR strain names are not all well curated.]
 * Can we get a table associating Jax stock numbers with strain names? (Alzforum cites strains by stock number.) [Yes, ask Janan]
 * (Are there other pieces of information we could ask for, from JAX? For instance, tables associating JAX stock numbers with Pubmed IDs, MP phenotype terms, or the "phenotype descriptions" that have been taken from papers?)

To ask Janan Eppig
Re use of MGI ids as the OBO identifier space for strains:
 * There are upwards of 200 strain names that occur as the host, donor, 'helper', or background in a congenic or coisogenic strain name, but do not have MGI ids. (See ../Missing_strains.) Is it appropriate to ask for MGI ids for these strains? Examples: WLC, 129X1, SEACGn, D. (We could then use these MGI ids in our representation of the relationships between strains.)  --- If not, we'll have to look for a different idspace for strains - is the MGI set a subset of the IMSR set suggested by Mike?
 * Similarly, the IMSR report for Jax lists 1863 strains that do not have MGI ids.
 * How are the MGI accession numbers for strains currently used - what deserves to get an MGI number and what doesn't?
 * Similarly, there are allele names occurring in strain names that are not in the alleles report, e.g. Pax6&lt;GsfAey11> . Maybe we can review ../Missing alleles.
 * Any coordination with NCBI Taxonomy?

Re parsing strain names:
 * When we have two strains A and A/Be, is this a substrain relationship, or a subclass relationship? Subclass would mean that if a mouse  belongs to strain A/Be, it also belongs to strain A, while substrain would mean that this is never the case (i.e. some lab originated strain A and that lab is not the Be lab).  (My guess is the former.) Or do we just not know what the relationship is?
 * The strain nomenclature guide doesn't seem to cover all cases, e.g. the --f-- convention, use of species name as strain name, unusual use of spaces (e.g. MRL/MpJ Fas -Cal2/J), multiple dots (B6.CAST-Gpi1.Cg-Hba)? Are you interested in updating it to cover such cases, and is there anything we can do to help?
 * What should we do about inconsistency between a strain's nomenclature and its 'type' as given in strain file?  For example: MGI:3716317	129-Elane Ctsg/H	is listed as type congenic in the strain file, but it has the syntax of a coisogenic strain. (There are many other examples.)
 * If 'AK' is an abbreviation for 'AKR strains', what is the difference between a congenic strain written AKR. and one written AK. ?
 * (Eventually work out a process going forward for resolving ambiguous or complex parses, e.g. CAL, CAnBomUrd)
 * (Not ready yet: ask about ../Missing_alleles = alleles mentioned in the strains report but not listed in the phenotypic alleles report - the list needs to be filtered and organized. It detects at least one mistake, and there may be others. Alan suggests picking a representative example from each class of discrepancy.)