TDWG applicability statement comments

Re TDWG Life Sciences Identifiers Applicability Statement:

The linked draft (a Word document) has two parts, the TDWG GUID Applicability Statement and the TDWG Life Sciences Identifiers (LSID) Applicability Statement.

Re GUID Applicability Statement
Under motivation it says LSID "does not work by default with the semantic web technologies". I don't think this is true, as OWL, RDF, SPARQL, SWRL, and their respective implementations such as triple stores are all perfectly happy to use any kind of URI, including a urn:lsid: URI. It is only when you try to do "linked data" that the HTTP requirement comes into play. It is worth making an effort to ensure that your readers are not confused by this.

"has lead to" typo, should be "has led to"

"Object" is used but not defined. (It is defined in the LSID AS.) Make it clear whether you mean just "data object" or a more inclusive category that includes "data object" as a special case.

The statement that a UUID "is not resolvable over the Internet" is not sufficiently precise. Certainly some UUIDs are resolvable given adequate information about their stewardship. For example, if you know that the UUID came from site example.org, then you might also know of a service hosted at example.org that can resolve all URIs coming from example.org, and thus the given UUID in specific. What you want to say is that there is no widely known and used service or protocol for resolving UUIDs, as compared to http: URIs for which there is.

The organization and exposition of section 1 should make it more clear the "ontology" of GUIDs. The correct classification goes like this:

URI http: URI PURL OCLC PURL (= http://purl.org/ URI) http:-proxied DOI (= http://dx.doi.org/ URI) http:-proxied LSID other kind of http: URI urn: URI urn:lsid: URI other kind of urn: URI other kind of URI (mailto:, tag:, ...) Handle DOI Other kind of handle UUID Other kind of GUID

This organization makes it clear which GUIDs can be used with RDF (namely URIs), which are resolvable using HTTP (the various kinds of http: URI), and also clarifies that any client of the Handle system is automatically able to deal with DOIs.

The section "HTTP URI" is very careful not to mention the linked data case, where it is used to identify an object that is not a "network resource". If you want to support the linked data case then failing to mention this at this point of the presentation will be very confusing. You would need to say that while the http: URI is used to identify (or name) an arbitrary object, it resolves to metadata - just as in the getMetadata case of LSID.

Re LSID section: "uses the domain name system to locate a resolution service" ... I don't think this is part of the LSID spec per se, but rather a possible usage pattern for the spec. SRV is a common heuristic or hint, a SHOULD rather than a MUST.

Re rec. 3 "A GUID must be assigned to a single object" -- this is confused. The second paragraph clearly doesn't apply and should be merged to rec. 4. But there are two points you could make here. One is that "important" objects should be assigned GUIDs. (Not every object needs a GUID; only the ones that we need to "talk about" do.) The other possible point is that any GUID MUST be assigned to only one object - I don't think you say that anywhere, except that it is really part of the definition of GUID.

I think you can safely say in this section (recommendation 3) that each agent that assigns GUIDs SHOULD assign at most one GUID to any particular object. This is stronger than what you have.

5. I have already complained about; sometimes you need to assign a GUID to an object for which you are not an authority, because the appropriate authority is not interested in doing so, or is not doing so in a way that supports persistence. I think you may be referring to authority over the metadata record, but authority over metadata does not constitute authority over the object described by it.

Re "10.	The default metadata response format should be RDF serialized as XML", are you eating your own dog food? Consider the URI http://www.tdwg.org/standards/150 described as a "Persistent URL" in the document. If this URI is to be a GUID, as seems to be the intent, then it SHOULD resolve to RDF serialized as XML. Does it?

In addition, while you admit the handle system (and DOIs) as acceptable GUIDs, the protocols do not provide a widely agreed way to provide RDF, so there is no way someone using the handle system can reliably follow your SHOULD recommendation to deliver metadata in RDF.

Re LSID Applicability Statement
In "motivation", where it says "will support" - do you want to use present tense?

"we use the term object to refer to an entity or information about it" - "or" and "it" (what?) are very confusing. A better conception is that there are objects (elsewhere called entities, things, resources), that this term is hard or impossible to define, but that it is meant to subsume anything that can be named, including specimens, locations, agents, and data records, metadata records, and publications.

(Or, if you prefer, limit your treatment to data objects and their metadata.)

Then you would say "organizations which disseminate data objects or metadata records about objects" instead "organizations which disseminate objects".

The language you have perpetuates the confusion between a thing and a description of a thing. This confusion can't be tolerated if RDF is going to be useful for any kind of reasoning, whether informal or informal.

The phrase "that are likely to be moved to different servers" is not strong enough. There should be separation if there's any reasonable possibility of a future need to separate the namespace.

11. Where it says "single object" please say "single data object" as other kinds of objects (e.g. specimens) are not generally versioned.

Your entire discussion of versioning could be moved to the GUID applicability statement; it is unrelated to the use of LSIDs and is good practice for data object versioning regardless of what GUID syntax is used or what access protocol is used to obtain the metadata record.

30. "Providers should not encode data in formats such as XML" -- this sounds like "do not use XML" which is not what you mean. If you add the word "dynamically" I think it becomes clearer what's going on: "Providers should not dynamically encode data in formats such as XML".

Consider a recommendation that LSID metadata include an assertion of a checksum, as an additional reminder to providers that the data must never change. This could be validated by tools. It would not guarantee immutability, but would make it a bit harder to violate the rule.

32. You should make the reason for this clear: The LSID identifies something. When it identifies a "piece of data" (this is the terminology used in the LSID spec) then the piece of data that's identified is the one returned by getData. When it identifies something else then getData returns zero-length content. (I'm not sure what the content-type should be or how to distinguish this from zero-length content, but that's another story.) In both cases, the metadata record describes what's identified. The identified object is not the metadata.

33. metadata may change -- this would be a place to reinforce that the LSID must continue to refer to the same object (i.e. MUST NOT be made to refer to a new object as a result of a change to the metadata). Again, this is a general GUID good practice and should be part of the GUID AS.

37. note sure what "the biodiversity information domain" means - are specimens in "the biodiversity information domain", or just data objects? I think you should change this to "the biodiversity domain" to make it inclusive of things such as specimens that are not information.

"bespoke (custom-made) ontologies should not be used" -- clarify that they should not be used in place of standard ontologies. Certainly custom ontologies must be used in order to communicate non-standard information; this is the fuel of innovation.

Again, this is not LSID specific.

15.1. Clarify "When the LSID identifies the information object being displayed", and change "an object that is related to the information [was main] object being displayed"

"identifier for this data item" - If you want to encourage LSIDs to identify objects that are not data objects, you must change this to "identifier for this object". (and after that "this data item" should be "this object".)

The unnumbered recommendation 39.5 "In HTML web pages, LSIDs that..." please provide HTML so that someone looking at a printed copy will know what you're talking about.

Again, change "data item" to "object".

42. "LSIDs in its standard form" - fix grammar

"Universal Resource Identifiers" - this is incorrect, the correct word is "uniform" not "universal" - please see RFC 3986.

15.4. "Standard clients do not need to dereference the identifier from the rdf:about property because they already have an RDF representation of the object." - this shows a misunderstanding of RDF. The about= can specify any URI, so sometimes the client will already have some RDF; sometimes it doesn't.

There is a second error here, which is that the RDF in hand would normally not be a representation of the object, but rather merely a description of it.

"retrieve the object" should be "retrieve the object's metadata"

43. I have already commented that you need owl:equivalentClass for classes and similarly for properties.

"representation of an object" -> "description of an object" again - please search the whole document for misuses of "representation"

"access to an object" -- again, you don't access the object (e.g. if it's a specimen), you access its metadata.

"a network of objects" -- again, you navigate the metadata, not the objects. Say "a network of metadata records" or "a network of object descriptions".