IAP 2009 notes 2009-01-13

Nearby: IAP 2009, IAP 2009 outline, Mon, Wed, Thu, Fri

Semantic Web mechanics
URIs, "triples", documents that connect the URI to the thing it names (denotes).

The following benefits that run under the "Semantic Web" banner are really orthogonal:
 * 1) RDF - provides common, general syntax; enables use of RDF-based tools such as SPARQL
 * 2) Prospectively shared URIs - enables integration
 * 3) Self-describing URIs (URIs that dereference to complete documentation) - enables use by the uninitiated
 * 4) Modeling using "natural" categories and relations (driven by thinking about reality, as opposed to structure of data) - enriches semantics and integration opportunities

Tim's stuff - genome annotation
Taking up some of Tim's representation questions. Suppose lineage B is like lineage A, but has a modification in gene Z conferring lethality.

Relevant ontologies (from OBO): PRO (proteins and complexes), SO (sequences), RO (relations), GO (proteins by cellular processes / components / functions), OBI (experiments), PATO (phenotypes)

To model - or get terms for - viability, Alan takes us to NCBI taxonomy browser, Ontology lookup ervice, OLS finding 'viability' in the PATO ontology

URI for Yeast: http://purl.org/obo/owl/NCBITaxon#NCBITaxon_4930 Yeast

URI for viability: http://purl.org/obo/owl/PATO#PATO_0000169. Lethal is PATO 718, viable is 719.

We want to distinguish between different strains of yeast...

Modeling the experiment using OBI: multiple steps
 * protocol application
 * sequence deletion (modification)
 * inputs: ...
 * outputs: ...

Using Protege 4 to browse OBI download

The OBI ontology: http://purl.obofoundry.org/obo/obi.owl. Switching to Protege 3.4 because Protege 4 has trouble with annotations. Searching for "culture" (for example).

Using OBI 3.4 now to create a fresh ontology file. Import OBI. The seach for "culture" is no longer working... Press T button in upper corner to make visible the class rdfs:Class. Go to forms. Pulldown, select rdfs:label as the thing to display. (This is not necessary in Protege 4.) ... We told it to do it, but it's not doing it. (If we don't display by label, we only see the end of the URI, which is just an accession number.)

Trying it in Protege 4 to see if it's more friendly... it shows labels (as it should). Let's create one of the experiments (with cultures an inputs and outputs). We make an instance of the class "establishing cell culture". We make an instance for the input cells, class "cell line cells".

What Alan does: Uses Protege 4 to explore the ontologies, models a single instance, writes it out, then writes a script that can generate lots of instances (from some original data source).

(Digression on the individual/property/class partition, first order logic, the embedding in RDF, why not say that classes have class Class, how you know that something's a class instead of an individual, etc.)

Adding yeast explicitly as a taxon under Eukaryota, because importing all of NCBI taxonomy. The tools aren't ready for adding all taxa, so we just extract this one. (There's an area of research trying to determine how much of an ontology needs to be extracted in order to use a single term from it.)

The input cells derive from ...

(Problem with Protege 4 - it gets some of the labels, but not others. Has some problem with annotations - annotations from imported labels are lost.  Bug.)

We're going to make a measurement of the culture now to see whether stuff is not growing. New individual, assay for growth, of class ??? (not sure what kind of assay or what the assay class hierarchy, so just say it's an assay).

Now, an assay result, that is a measurement of ....

A cell, that is part of the cell culture, lacks a part, which is DNA, which is described by specifying coordinates within some designated sequence (record), such as an NC record.

Actually it's worse than this - the changes are specified operationally - I did this and that to something that was supposed to have a sequence specified somehow...

but sure, let's have sequence specifications, which are sequences with pedigrees (documentation).

Abstraction via inference: Instead of exploding each observation into this long story, we may be able to assert, in OWL, that all parts of the story must exist, so that the story is inferred from more concise assertions (class memberships etc.).

(break)

a knockout assay is a process that has part some initiation of cell culture that has specified input some cell(cell 1) that derives from yeast cell and lacks part some dna that described by some sequence specification and has part some assay that has specified input some cell and has specified output some measurement datum that is about some viability (that inheres in cell1)

Alan TBD...

Ilya's stuff - registry of biological parts (synthetic biology)
http://neurocommons.org/w/images/4/42/Biobricks.csv

It's a 4M file; Excel is not very happy. Trim it in Excel (could have used Unix 'head' command). Looking at columns now.

Parts have subparts (component_list), description, sequence, authors, creation date, version, ...

http://partsregistry.org/

Some questions one might want to ask * Search for function (repressor, ...) * Availability * Which ones have been used in devices * What devices have been built * What are similar parts * Parts containing the given part

XQuery overview. We attempted to process a subset of the registry, as exported as XML, using XQuery, but ran into a bug ("SXXP0003: Error reported by XML parser: Content is not allowed in prolog.") Time to validate Excel's XML.

http://esw.w3.org/topic/ConverterToRdf has a list of converters to RDF.

JAR's XQuery invocation script:

#!/bin/sh SAXONDIR=~/nc/build/external/saxon java -Xmx1000m \ -cp $SAXONDIR/saxon9.jar net.sf.saxon.Query \ -s - \ $1

e.g.

saxon registry.rdf convert-registry.xq

[OpenLink Software's RDF browser http://demo.openlinksw.com/DAV/JS/rdfbrowser/index.html]