URI documentation protocol

(Back to URIs, URI requirements, URI documentation, Documentation source override, Separation of concerns)

How should one go about finding documentation that will explain what a URI is meant to name (or "denote"), in cases where this is needed and it is not already at hand? HTTP was designed to get you a web page, not to get information about naming, so the following protocol is not as elegant as would be a protocol designed with naming in mind.

(URI documentation generally consists of a short description of what is to be named by the URI, but may also contain information about the status of the documentation itself, such as authorship or its progress along a review track.)

If you're doing bulk processing of URI documentation, you may be better off doing bulk downloads or SPARQL queries on an appropriate SPARQL endpoint, as large numbers of probes will be inefficient and will load servers, usually unnecessarily.

The following protocol is designed to be forgiving enough to grandfather many URIs already in use on the Semantic Web, such as those in the RDF Schema vocabulary and Dublin Core, while strict enough to support our URI requirements. It coincides with HTTP in the absence of overrides, responses containing a Location: header, and 200 responses.

Draft protocol
Note that this protocol is not "authoritative" because for the use of URIs in communication (i.e. to denote) an authoritative source of correct URI documentation is inherently impossible to establish using any technical protocol.

1. If there is a documentation source override rule that applies to the URI, apply it. For a documentation URI mapping rule, proceed to step 6. For a replacement URI mapping rule, go to step 2.

2. If the URI is not an http: URI, documentation access is not specified by this protocol.

3. If the URI contains a fragment identifier (#), strip the # and following characters to obtain a documentation URI. Go to step 6 and be prepared to find the documentation you want mixed in with documentation for a lot of other URIs.

4. If there is no #, do a HEAD request with Accept: application/rdf+xml, application/xhtml+xml, text/html. (The higher-priority request for RDF is necessary in order to encourage content negotiation in the direction of a 303 (step 5b), and request for HTML is necessary in case a server responds with a 200 (step 5c).)

5. Determine a documentation URI using one or more of the following methods:

5a. If a response has a response header of the form Link: <...>; rel="meta", take the link target to be the documentation URI. ("meta" should be replaced by a specific URI. See Link header.)

5b. If a response has status code 303, assume the Location: URI to be the documentation URI. (This use of 303 is a convention in use in Semantic Web contexts. The HTTP protocol does not require that the redirected-to URI be URI documentation, so the result should be treated with caution.)

5c. If the response has status code 200 and has an HTML media type (Content-type: text/html or application/xhtml+xml), do a GET to get the content specifying that Accept type. Look for a  element under the document's element, and take its target to be the documentation URI. Discard the rest of the 200 response (unless you are also specifically interested in it as well). ("meta" should be replaced by a specific URI. See Link header.)

(TBD: Specify a theory of how to use GRDDL to convert HTML to RDF.)

5d. If the response specifies a redirect (30x other than 303), follow it per the HTTP protocol and repeat step 5.

(Cooperating servers must arrange for documentation found via Link: to be the same documentation as that found via 303 or, when both methods lead to documentation.)

6. At this point you should have a second URI (the "documentation URI"). Do a GET of the documentation URI, redirect as needed, and see whether the response constitutes documentation for the URI. Specify Accept: application/rdf+xml for machine-readable documentation.

You should end up with URI documentation at this point. As we are recycling the HTTP protocol for an unintended purpose, you might have something else instead, so the result should be treated gingerly.

We ask servers to follow certain documentation quality standards in the documentation that they deliver. In particular, documentation should be explicit about what the URI is supposed to denote.