Semantic resources project/CandidateResources/Resources/SWAN Resources/Conversion Notes

The following are some notes, dictated by Paolo C. on 11/18/2009, concerning the differences between the SWAN 1.0 and 1.2 ontologies. These are centered around the differences between the models of citation in each model.

SWAN 1.0


Class Hierarchy in SWAN 1.0:



Classes
Counts of the individuals in each of the separate classes mentioned in the SWAN 1.0 RDF export file.

1     http://boca.adtech.ibm.com/schemas/queso#IntrospectionGraph 5332  swanadmin:SWANACL 257   swanadmin:SWANRole 254   swanadmin:SWANUser 16461 swancollect:Collection 30026 swancollect:Item 16461 swancollect:OrderedCollection 30026 swancollect:OrderedItem 13188 swan:AuthorItem 7663  swan:AuthorsList 45    swan:Comment 8189  swan:Concept 1785  swan:DigitalResource 1946  swan:DiscourseElement 16838 swan:DiscourseElementItem 8798  swan:DiscourseList 5349  swan:Gene 1740  swan:JournalArticle 7     swan:JournalComment 5     swan:JournalNews 8014  swan:KnownPerson 8189  swan:LifeScienceEntity 1     swan:NewspaperArticle 29    swan:Organism 8014  swan:Person 2811  swan:Protein 1741  swan:PublishedArticle 39    swan:PublishedComment 5     swan:PublishedNews 51    swan:ResearchQuestion 1850  swan:ResearchStatement 16461 swan:SWANList 30026 swan:SWANListItem 19934 swan:SWANThing 32    swan:WebComment 3731  http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq Total # of Triples: 827082

Object Properties
Number of triples with each different predicate that has rdf:type owl:ObjectProperty.

26767 	swancollect:nextItem 13188 	swan:pointsToAuthor 26767 	swancollect:previousItem 16838 	swan:pointsToDiscourseElement 6883 	swan:enteredBy 51262 	swan:hasOrtholog 8161 	swan:hasNative 2812 	swan:encodedBy 405 	swan:derivedFromProtein 2816 	swan:encodes 2781 	swan:citesAsSupportingEvidence 710 	swan:supports 1240 	swan:citesLifeScienceEntity 1945 	swan:containsDiscourseList 1562 	swan:derivedFrom 3730 	swan:hasAuthorsList 420 	swan:hasDerivedProtein 29606 	swancollect:item 3259 	swancollect:firstItem 1428 	swan:curatedBy 28 	swan:motivatedBy 215 	swan:alternativeTo 134 	swan:refutes 40 	swan:discusses 1716 	swan:contains 40 	swan:inResponseTo 20 	swan:citesAsDiscussingEvidence 1 	swan:citesAsRefutingEvidence 0 	swancollect:itemObject 0 	swan:annotates 0 	swan:citesAsDiscourseEvidence 0 	swan:discoursesSWANEntity 0 	swan:cites 0 	swan:seeAlso 0 	swan:evolvedFrom 0 	swan:authoredBy 0 	swan:citesExternalEntity 0 	swan:citesConcept 0 	swan:geneVariantOf 0 	swan:nativeIn 0 	swan:proteinVariantOf 0 	swan:hasGeneVariant 0 	swan:hasProteinVariant 0 	swan:partOf 0 	swan:hasPart 0 	swan:hasDerived 0 	swan:orthologOf

Journals
SWAN Journals

Authors
SWAN Authors

The general layout of the JournalArticle items in SWAN1.0 includes an AuthorsList element:  a swan:JournalArticle; swancore:hasAuthorsList [ swancollections:item [ swancore:pointsToAuthor . ] .    ].

The general layout of the author item itself is:

urn:lsid:swan.org:knownperson:2ea3adc3-37ea-422b-b05b-89634ae6dc19 rdf:type swancore:Person ; rdf:type swancore:SWANThing ; rdf:type swancore:KnownPerson ; swancore:enteredBy  ; swancore:modificationDate "2009-10-14T17:03:42.520Z" ^^ ; swancore:creationDate "2009-10-14T17:03:42.520Z" ^^ ; swancore:importDate "2009-10-14T17:03:42.512Z" ^^ ; swancore:forname "Sam" ^^ ; swancore:modificationTimestamp "1255539822520"   ^^ ; swancore:surname "Johnson" ^^.

SWAN 1.2
Draft Description of SWAN (1.2)



SWAN 1.2 Prefix: http://purl.org/swan/1.2/

= Notes on the SWAN Conversion =

SWAN 1.2 Prefix: http://swan.mindinformatics.org/ontologies/1.2/

Sparql Prefix Block: PREFIX swan2:	 PREFIX de:  	 PREFIX dr: 	 PREFIX qual:	 PREFIX rsqual: 	 PREFIX agents: 	<http://swan.mindinformatics.org/ontologies/1.2/agents/> PREFIX collect:	<http://swan.mindinformatics.org/ontologies/1.2/collections/> PREFIX pav:	<http://swan.mindinformatics.org/ontologies/1.2/pav/> PREFIX foaf: 	<http://xmlns.com/foaf/0.1/> PREFIX who:	<http://swan.mindinformatics.org/whoweare.html#> PREFIX owl: 	<http://www.w3.org/2002/07/owl#> PREFIX rdfs:	<http://www.w3.org/2000/01/rdf-schema#> PREFIX rdf:	<http://www.w3.org/1999/02/22-rdf-syntax-ns#>

KnownPerson Class
We start by converting over all the corresponding URIs for relations are generated. The corresponding literals are xsd:Strings.
 * Each swan:KnownPerson in swan1 becomes a agents:PersonName in swan2.
 * Some people are related to a foaf:Person?
 * Associated swan:surname, swan:forname, swan:middleName, and swan:fullName

Known People already in SWAN 1.2

DiscourseElement Class

 * de:ResearchStatement
 * Hypothesis	(de:ResearchStatement and (dr:researchStatementQualifiedAs value rsqual:hypothesis))
 * Claim	(de:ResearchStatement and (dr:researchStatementQualifiedAs value rsqual:claim))
 * de:ResearchQuestion
 * de:ResearchComment

A collection of all the relations on DiscourseElements from swan 1.0


 * swan:refutes
 * swan:supports
 * swan:discusses
 * swan:motivatedBy
 * swan:alternativeTo
 * swan:inResponseTo


 * swanadmin:isDraft
 * Not converted
 * swanadmin:aCL
 * Not converted


 * swan:creationDate
 * swan:modificationDate
 * swan:modificationTimestamp
 * swan:curatedBy
 * swan:enteredBy


 * swan:description
 * swan:title


 * swan:containsDiscourseList
 * swan:contains
 * swan:authorSequence
 * swan:hasAuthorsList


 * swan:hasPathogenicNarrativeQualifier
 * swan:derivedFrom
 * DOMAIN=DiscourseElement
 * RANGE-DigitalResource


 * swan:hasEvidencetypeQualifier
 * swan:researchStatementQualifier


 * swan:citesLifeScienceEntity


 * swan:citesAsSupportingEvidence
 * swan:citesAsRefutingEvidence
 * swan:citesAsDiscussingEvidence

An example DiscourseElement:

urn:lsid:swan.org:researchstatement:ed3f6c7d-e597-43f2-a790-a5aeffb966c3 rdf:type SWAN Thing , swan:DiscourseElement , swan:ResearchStatement ; swan:authorSequence blank#1622 ; swan:citesAsSupportingEvidence urn:lsid:ncbi.nlm.nih.gov:pubmed:16782885 , urn:lsid:ncbi.nlm.nih.gov:pubmed:12514700 , urn:lsid:ncbi.nlm.nih.gov:pubmed:12829747 , urn:lsid:ncbi.nlm.nih.gov:pubmed:15642747 , urn:lsid:ncbi.nlm.nih.gov:pubmed:12112088 , urn:lsid:ncbi.nlm.nih.gov:pubmed:12223024 , urn:lsid:ncbi.nlm.nih.gov:pubmed:11466313 ; swan:citesLifeScienceEntity Beta-secretase 1 ; swan:containsDiscourseList urn:lsid:swan.org:discourseelementitem:1f7ea838-7b95-4015-bd0b-aafed110eedd ; swan:creationDate "2008-12-09T15:52:58.907Z" ; swan:curatedBy urn:lsid:swan.org:knownperson:cbed1d25-059b-420c-a8b9-cfe4c53c4df6 ; swan:derivedFrom urn:lsid:ncbi.nlm.nih.gov:pubmed:18322404 ; swan:description "CAD cells that contain A? accumulations within neurites also contain BACE1, the major ?-secretase, at the same location, coincident with A?. The extent of BACE1 accumulation within the neuronal processes of these cells is abnormal, since this enzyme is normally localized to Golgi compartments and early endosomes in the cell body.  This result suggests that, in vivo, mislocalization of BACE1 may cause the production and accumulation of A? within processes and synaptic regions.  This result is in line with a recent report that proposes that BACE1 localization – in addition to its expression level - determines the amount of generated A? and its accumulation in plaques." ;	swan:enteredBy urn:lsid:swan.org:knownperson:842e1de5-b5ae-4899-81d2-045585bf4cf7 ; swan:hasAuthorsList urn:lsid:swan.org:authorslist:fbf1324d-8f43-45c4-809f-0e6d05a1a81a ; swan:hasEvidencetypeQualifier "" ,		"Mechanism - biochemistry and structural biology" , "Mechanism - molecular and cell" ; swan:hasPathogenicNarrativeQualifier "" ,		"Pathogenic event" ; swan:modificationDate "2008-12-09T15:52:59.018Z" ; swan:modificationTimestamp "1228837979018" ; swan:researchStatementQualifier "http://swan.mindinformatics.org/ontology/1.0/20070313/core.owl#Claim" ; swan:supports urn:lsid:swan.org:researchstatement:2fc5ce54-c97b-44dd-843c-892009e518e8 ; swan:title "CAD cells that accumulate A? show redistribution of ?-secretase (BACE1) to the processes, where it co-localizes with A?." ;	swanadmin:isDraft "false".

DiscourseRelation properties
Converting the relations of SWAN Discourse Elements:

Total of 1946 Discourse Elements - 1850 Research Statements - 51 Research Questions - 45 Comments

From Paolo, the transliterated relations among discourse elements are

researchStatementQualifier -> dr:researchStatementQualifiedAs

http://swan.mindinformatics.org/ontology/1.0/20070313/core.owl#Hypothesis -> rsqual:hypothesis

http://swan.mindinformatics.org/ontology/1.0/20070313/core.owl#Claim -> rsqual:claim

(Question: How should the other qualifiers be carried over into 1.2?)

SWAN 1.0 Evidence Type Qualifiers: 0:   1: Mechanism - physiological systems 2: Epidemiology 3: Neuropsychology/Behavior 4: Genetics 5: Pathophysiology 6: Biomarkers 7: Clinical trials 8: Animal models 9: Target validation 10: Mechanism - biochemistry and structural biology 11: Mechanism - molecular and cell

SWAN 1.0 Pathogenic Narrative Qualifiers: 0:   1: Initial condition 2: Pathogenic event 3: Perturbation 4: Pathologic change

(Question: Notice the blank qualifiers in both lists. What do we do with these?  (Drop them, almost certainly.))

(Question: SWAN 1.2 has an "inResponseToList" property -- how do I generate the list of entities to which a Discourse Element is inResponseTo? Especially since the corresponding property in 1.0 doesn't appear to be ordered?)

Protein Conversion
-- SWAN relations: swan:modificationDate swan:enteredBy swan:modificationTimestamp

-- LSE relations: swan:hasName swan:hasPreferredName swan:hasNative

-- Protein relations: swan:derivedFromProtein swan:encodedBy swan:hasAuthoritativeId swan:hasAuthoritativeSource swan:hasAccessionNumber swan:hasDerivedProtein

(Question: what's the difference between hasAuthoritativeId and hasAccessionNumber for each Protein?)

LSE Conversion
Example Organism:

urn:lsid:swan.org:organism:38521014-c95f-47ac-8a08-f5b971999d8d rdf:type SWAN Thing , swan:LifeScienceEntity , swan:Concept , Organism ; swan:hasAuthoritativeId "180454" ; swan:hasAuthoritativeSource "Uniprot" ; swan:hasName "" ,		"Anopheles gambiae str. PEST" ; swan:hasPreferredName "Anopheles gambiae str. PEST" ; swan:lineage "cellular organisms,Eukaryota,Fungi/Metazoa group,Metazoa,Eumetazoa,Bilateria,Coelomata,Protostomia,Panarthropoda,Arthropoda,Mandibulata,Pancrustacea,Hexapoda,Insecta,Dicondylia,Pterygota,Neoptera,Endopterygota,Diptera,Nematocera,Culicimorpha,Culicoidea,Culicidae,Anophelinae,Anopheles,Cellia,Pyretophorus,gambiae species complex,Anopheles gambiae" ; swan:modificationDate "2008-03-31T20:22:09.265Z" ; swan:modificationTimestamp "1206994929265".

Notes: (a) Organisms have UNIPROT identifiers. In this case, the UNIPROT page http://www.uniprot.org/taxonomy/180454 links to to the NCBI Taxon page: http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?lvl=0&id=180454 which suggests, at the very least, that the identifier numbers in Uniprot are taken from NCBI (or vice-versa?)

Example Gene:

urn:lsid:swan.org:gene:c46fc7e7-1d58-4c1b-80f0-0b0d3c9f2d21 rdf:type swan:SWANThing ("SWAN Thing") , swan:LifeScienceEntity , swan:Gene ("Gene") , swan:Concept ; swan:chromosome "14" ; swan:fullName "cathepsin G" ; swan:hasAuthoritativeId "467413" ; swan:hasAuthoritativeSource "Entrez" ; swan:hasNative urn:lsid:swan.org:organism:69c1afdf-05d8-4413-8802-61dfa612dad3 ("Pan troglodytes") ; swan:hasOrtholog urn:lsid:swan.org:gene:d81c3033-260a-4641-ae41-58e8e341af8f , urn:lsid:swan.org:gene:fc80e4ee-fb25-4d9b-a956-bd29bcb9d83a , urn:lsid:swan.org:gene:1592d936-26f9-4041-b12a-1460bfd94236 , urn:lsid:swan.org:gene:246ed3ec-b8cb-427f-b511-22b0086f42b7 , urn:lsid:swan.org:gene:6a6b3abd-d72a-410a-92d0-98b2c17cfd41 , urn:lsid:swan.org:gene:41a3ca1a-3559-4878-befe-38102626ff22 , urn:lsid:swan.org:gene:2261ddc8-3758-4896-bb43-5434a9f204be ; swan:hasSymbol "CTSG" ; swan:modificationDate "2008-07-30T22:30:57.312Z" ; swan:modificationTimestamp "1217457057312".

(Question: What's the difference, in SWAN 1.2, between "hasOrtholog" and "orthologOf"? They're inverses, but I wasn't aware that being-an-ortholog-of was a directional relationship...?)