JAR IEO notes

This is Jonathan's attempt to justify the use of URIs or DOIs to name changing documents. This addresses two needs: (1) The Neurocommons site specifies a particular semantic web architecture, and that architecture needs a way to justify the practice of changing URI documentation; and (2) the W3C TAG's web architecture needs a solid theory to reconcile the view of information resource as abstract document with the operational view that the information resource is what is gotten.

Very early thoughts, not to be taken too seriously, especially the part at the end about taxa.

The story
A document in the physical world has the same character as any other physical object: it is created at some time, changes in various ways, and at some point ceases to be. A particular book is printed, purchased, used, and damaged, and ultimately may become compost.

(Alan R thinks document is the wrong metaphor for the story I'm about to tell, as one wouldn't say that http://news.google.com/ was a document (unless you were TimBL). Maybe a slate - as in what a 19th-century schoolchild writes on with chalk? Or a small whiteboard.)

But documents have a unique property - they can be read and therefore copied. The copy is of course a copy of the document as it is at the time it is copied; if the original changes, the copy does not necessarily change. And the copy is only like the original in respects significant for communication, not other details such as the constitution of the ink used in printing. Nevertheless the copy is as good as the original in some sense - the two are confusable.

(Note that confusability is a function of the uses to which one will put the document or the copy.)

Define the document's message to be whatever it is about the document at a given time that determines whether it will be confusable with another document, and say that the document carries that message. That is, D1 and D2 are confusable iff they carry the same message.

Physical documents may change over time, as for example the signin book at an art gallery. In this case a physical document that is a copy at one time may or may not be a copy later on. If changes are kept in sync, so that the two documents always carry the same message, then once again it might be difficult to resist saying that the two changing documents are "the same thing". If one signin book disappears, but a copy has been kept up to date, then the copy may be considered the "same" as the original, because confusability has been preserved through time.

Actively updated copies might be compared to broadcast communication, in which many physical displays carry the same signal (time-varying message), or to a Web page (well, certain ones), where many physical displays carry the same message because they're all "visiting" the page, and if the page changes, the displays (if refreshed) will see new messages.

This suggests an abstraction that one might call abstract document. At any given time an abstract document is associated with a message (in a manner analogous to the way a physical document carries one), potentially a different message at different times; its identity is determined by that time-dependent association.

(See also Information Entity Ontology, First IEO workshop, Document variability)

What does this say about biological populations?
In the above substitute as follows:
 * (physical) document -> organism
 * message -> genome / phenome / species / taxon?
 * copy (verb) -> propagate
 * copy (noun) -> offspring
 * change -> evolve
 * level of detail -> level in taxonomic hierarchy
 * abstract document -> population/species