SPARQL in Scheme

This page describes an embedding of SPARQL in Scheme.

The code is in svn.

The basic idea
RDF and SPARQL are particular syntaxes for first-order logic. When first-order logic is used as a knowledge representation language, names (URIs) are bound to things in the world, or to other entities that are not necessarily computational. In Scheme, names are bound to computational entities such as numbers, lists, procedures, and macros. To blend the two, we need to explain the relation between things in the world and things in the computer.

The approach taken here is to say that a name, say "Paris", has dual interpretations, one as something in the world (a city) and one as a computational entity. There is a relation between the two: the computational entity is a "doppelganger" for the city. The purpose of the doppelganger is to act as a handle when attempting to reason about the thing.

In practice this means, for the current iteration of this software, that the doppelganger carries the thing's URI, and that's pretty much it. The URI is wrapped in a record so that it doesn't get confused with a string, i.e. it's rendered in Turtle or SPARQL as rather than "foo". In future iterations it may carry properties of the thing, but that can wait.

Individuals
Doppelgangers for logical individuals (things) are created lo: (define paris (individual-named "http://example.com/paris"))

We can render a doppelganger as a Turtle term using write-term: (write-term paris (current-output-port)) 

Predicates and statements
Predicates are abstractions of logical propositions, i.e. they are functions from some arguments to a truth value. In our Scheme world, the predicate names are bound to Scheme procedures that are statement constructors - they return not Scheme truth values but rather statements that can be rendered as Turtle or SPARQL. This Turtle or SPARQL can be interpreted to be true or false about the world.

A one-class predicate is a class, and a two-place predicate is a "property" (the unfortunate term used in RDF and OWL). The following creates a property's statement-constructor-procedure: (define has-capital (property-named "http://example.com/hasCapital"))

and this creates a class's statement-constructor-procedure: (define city (class-named "http://example.com/City"))

We can form statements using the usual prefix Scheme syntax  where P denotes a predicate. Interpreted as Scheme expressions, we are just calling the statement constructors. (define france (individual-named "http://example.com/france")) (has-capital france paris)

The resulting statement can be rendered as Turtle / SPARQL with write-statement: (write-statement (has-capital france paris) (current-output-port))   .

(Beware: Scheme uses prefix notation, so we write, while RDF is an infix language, so we write  .)

We form the conjunction of a bunch of statements using the  operator: (conjunction s1 s2 ...) In RDF, conjunctions are called "graphs".

Abstractions
OK, the interesting thing here is, which is the analog of Scheme. It has the same syntax as : (predicate (?var1 ?var2 ...)  ... statements involving ?var1 and ?var2 ...) and similar semantics: (define ville (predicate (?x) (city ?x))) (write-statement (ville paris) (current-output-port))   .

The body of a  form is processed as the conjunction of its statements.

The difference from  is that a predicate can be serialized as a SPARQL query: (write-predicate-as-select ville (current-output-port)) select ?x where { ?x  . }

RDF and OWL only provide syntax for applications of one- and two-place predicates (classes and properties), whereas SPARQL queries express N-place predicates, and a result set consists of argument tuples for which that predicate is satisfied. We aren't used to thinking of these as the same kind of thing, but they are in the present framework. The correspondence can be seen more easily by observing the interconvertibility of many one-place predicates expressed as SPARQL queries with class expressions written in OWL. For example, the DL class hasCapital value paris (written here in Manchester syntax) expresses the same class as that expressed by the SPARQL query select ?city where { ?city hasCapital paris. }

The net effect of treating RDF predicates (classes, properties) and SPARQL predicates (SELECT queries) uniformly is that it is easy to build up SPARQL queries incrementally, by augmenting and combining them in new ways; and there is no distinction between the way a "primitive" statement and a "derived" one are written - if I say (p x y) then  could be rendered as a URI, or could lead to substitution into a template ("macro" expansion). The code that uses  doesn't need to know which it is.

The advantage of embedding this in Scheme, as opposed to just creating an S-expression syntax for RDF (as LSW does), is that now we get to leverage all of Scheme's features in the construction of queries without having to hope back and forth from one language to the other. In particular, we can write statement- and query-generating macros that can be used directly in the places where they're needed, not just in the external framework.

Consulting a SPARQL endpoint
1. Define a SPARQL endpoint (query answerer) residing on the web. (define endpoint  (web-endpoint "http://sparql.neurocommons.org/sparql"))

2. Designate something whose properties we want to see (for the example query, following). (define plasmid-1  (individual-named "http://purl.org/science/plasmid/addgene/1827"))

3. Pose the query to the endpoint. (pose-query endpoint (predicate (?p ?o) (?p plasmid-1 ?o)))

4. Examine the result set (Scheme 48). -- This is not the desired form; for consistency with the  analogy, it would be better for the values to be positional, not named. (We can always tack the names back on later if we need to.) Also the URIs should be turned into doppelgangers. ,open pp (p ##) (((p "http://www.w3.org/2000/01/rdf-schema#subClassOf" uri) (o "http://purl.org/science/owl/sciencecommons/synthetic_plasmid" uri)) ((p "http://www.w3.org/2000/01/rdf-schema#label" uri) (o "FRAP" literal #f))  ((p "http://purl.org/science/owl/sciencecommons/has_addgene_identifier" uri) (o "1827" literal #f)) ((p "http://purl.org/science/owl/sciencecommons/carries_DNA_described_by" uri) (o "http://purl.org/commons/record/ncbi_gene/14182" uri)) ((p "http://purl.org/science/owl/sciencecommons/is_described_in" uri) (o "http://purl.org/science/article/pmid/1309590" uri)) ((p "http://purl.org/science/owl/sciencecommons/has_offer" uri) (o "nodeID://1000010000" literal #f)) ((p "http://www.obofoundry.org/ro/ro.owl#part_of" uri) (o "http://purl.org/science/owl/sciencecommons/synthetic_plasmid" uri))) Better: a list of two-element lists... ((#{Doppelganger } #{Doppelganger }) (#{Doppelganger } "FRAP")  ...)

and don't look too closely at this RDF; it's abysmal modeling (e.g. a plasmid class is not the same as an offer to sell some of its members).