IAP 2009 notes 2009-01-15

Nearby: IAP 2009, IAP 2009 outline, Mon, Tue, Wed, Thu, Fri

Legal issues in ontology and data use
Jonathan's notes on Thinh's presentation.


 * Copyright
 * License
 * Licensor definition
 * Licensee definition
 * Contract - mutual promises
 * License is different from contract. FSF says GPL is a license but not a contract.


 * Copyright starts with the invention of the printing press.
 * 1709 statute of Anne in England
 * US put copyright into the constitution
 * 1790 copyright act - protection for 14 years for maps, charts, books
 * 1909 extended it to 28 years
 * 1979 life of author + 50
 * 1998 life of author + 70, 120 from creation / 90 after publication for corporate
 * Berne convention normalizes rights internationally, eliminates need for (c) and registration - no loss of rights if you don't use the symbol (symbol does help with damage awards)


 * USC 106 - exclusive right in copyrighted works - e.g. right to public display.
 * Recent 6th protection: streaming music online
 * Work for hire is corporate work - copyright belongs to corporation

Q: How does joint copyright work at MIT? A: Parties can agree... not familiar with MIT

What does copyright protect? Title 17 sec 102 of the copyright act lists 8 categories of works that can be protected. Not necessarily an exhaustive list. Section 117 says software is protectable. Work must be in a "tangible medium". RAM isn't tangible... technology is always ahead of copyright law.

Work must be "original" - this is a constitutional requirement. Feist publications vs. Rural Telephone: Phone book is not original, therefore not protectable. #1: not copied from another work. #2: must show creativity, although the bar is low.

Computer Associates vs. Altai - job scheduling software. Altai accused of copying design. Court's test has 3 parts: (1) Abstraction - source code is lowest layer; highest levels may not be protectable. (2) Filtration step - identify nonprotectable elements and exclude them: efficiency (doing it the most efficient way), external factors forced on you (eg adherence to a standard), in public domain. (3) Comparison: are similarities substantial and important?

Problem: The test is not predictable - you can't apply them ahead of time.

Q: Does it matter whether putative infringement is for profit or not? A: This relates to fair use; this is just one factor considered in a fair use determination.

Borland vs. Lotus.

Now apply this to ontologies. Apply the three part test - exclude determined, forced, and unimportant elements. In practice you don't want to litigate because the jury won't understand these issues deeply.

(b) copyright doesn't apply to idea, discovery, process, etc. - there has to be originality - and it's the originality that's protected, not the idea.

Joyce's estate. Rawlings.

What about internationally? Australian court said, the choice of what to put into the phone directory was an exercise in skill and judgment, and therefore protectable. "Sweat of the brow" theory of copyright. In Europe, there is sui generic protection for databases - protecting the economic investment. Something different in China.

Question: What about health care ontologies - these are public, right? And not protected? Answer: Not necessarily. Let's talk about them later.

Adding value to public domain material can lead to a protectable work.

Licenses that have been applied to data: GPL, BSD, CC share-alike.

American Dental Assoc. vs. Delta Dental. 7th circuit reversed the lower court. A classification is a creative endeavor. - Not just a collection of bits of reality.

Similar: AMA vs. Practice Management Group. Complexity: It's government mandated, so not protectable... but court didn't agree with this argument. Similar: WestLaw's pagination is protectable.

Red Hat uses trademark, not copyright, to protect its "product". Consider this if you're distributing data or ontologies.

Clearest from the point of view of public policy? Consider dedicating to the public domain. One way to do this is CC0.

Attribution stacking is a problem. The administrative burden of preserving attribution may inhibit large-scale integration. Plan for the future.

Other options
Four ways to protect: copyright, trademark, patent, trade secret.

Patent protection requires registration. Must disclose prior art. Invention must be novel, nonobvious. Copyright cannot protect against independent creation of same thing, while patent can. Patents: devices, processes, compositions (of matter), plants/microorganisms.

Trade secret has no federal statute. Trade secret litigation is big business in California. You have to take reasonable precautions to keep something secret; otherwise trade secret law doesn't apply.

Trademark is a kind of consumer protection - designed to keep consumers from being misled by inferior competitors. Later, it was extended to protect companies to protect them from unfair competition.

In most countries, you need to register. In US you have a choice - you can rely on common law (state) protection or you can register and get protection federally.

You have to police your trademarks - send letters to infringers.

Q: What about domain names? A: People have tried to protect the name under trademark law.

Continuing with Federal Tax 1798 data
Begun Monday.

Make an ontology for the two data sets using Protege 4.

http://neurocommons.org/w/images/c/cd/Aire_township_1798.xml [[Media:Aire_township_1798.xml]]

Cost/benefit analysis is relative to goal. Goal might be to publish the data so that it can be used by other people, minimizing their integration effort. Or it might be to integrate a set of sources that you've collected, for your own uses only. These lead to very different tradeoffs.

Attempting to upload the .owl file into the wiki... failed, not sure why.

Using LSW to convert the source file; checking consistency using Pellet (from LSW); attempting to view in LSW, but there are character encoding issues (again!).

Alan is sending the transcript from his LSW session.


 * [[Media:TaxRecords.owl|TaxRecords.owl]]
 * [[Media:Aire-data.owl|Aire-data.owl]]
 * [[Media:Annapolis-data.owl|Annapolis-data.owl]]
 * [[Media:to-owl.lisp|to-owl.lisp]]
 * [[Media:Aire township 1798.txt|Aire township 1798.txt]]
 * [[Media:annapolis_1798.txt|annapolis_1798.txt]]

(translate-annapolis) WARNING: What's this name? - Charles Carroll of Carrollton WARNING: What's this name? - Mary Rawlings and Robert Miles WARNING: What's this name? - Charles Carroll of Carrollton WARNING: What's this name? - Charles Carroll of Carrollton WARNING: What's this name? - Richardson WARNING: What's this name? - John Bond and Thomas Browse WARNING: What's this name? - rented WARNING: What's this name? - rented WARNING: What's this name? - rented WARNING: What's this name? - Robert Koy and William Bishop WARNING: What's this name? - Thompson WARNING: What's this name? - James P. Maynard WARNING: What's this name? - Ann Jackson and Henry Bahre WARNING: What's this name? - "James Lusby, Battee, Susanna Lusby WARNING: What's this name? - "Thomas Chambers, William Ferry WARNING: What's this name? - Darby WARNING: What's this name? - "William Foxcroft, Willliam B WARNING: What's this name? - "Simon Rotallich, Jr. WARNING: What's this name? - 5 Black People WARNING: What's this name? - Grammar WARNING: What's this name? - Andrew WARNING: What's this name? - Anglin (translate-aire) WARNING: What's this name? - ROBERT JR RAMSEY WARNING: What's this name? - JOHN JR OLINGER WARNING: What's this name? - JOHN SR OLINGER WARNING: What's this name? - JOHN SR POLK WARNING: What's this name? - JAMES JR POLK WARNING: What's this name? - JOHN JR SOOK (sparql '(:select (?last) (?name !rdf:type !tax:Person_Name) (?name !tax:has_first_name ?first) (?name !tax:has_last_name ?last) (:filter (equal (str ?first) "Ann")))		 :kb kb :trace t :values nil :use-reasoner :none)

PREFIX tax:  PREFIX rdf:  SELECT ?last WHERE { ?name rdf:type tax:Person_Name. ?name tax:has_first_name ?first. ?name tax:has_last_name ?last. FILTER (str(?first) = "Ann")} Results: "Gaton" "Wishan" "Townshend"

(define-ontology foo 	  (owl-imports "file:///Users/alanr/TaxRecords/aire-data.owl")	   (owl-imports "file:///Users/alanr/TaxRecords/annapolis-data.owl"))

(check foo) T
 * OWL-DL

(sparql '(:select (?last ?person) (?name !rdf:type !tax:Person_Name) (?name !tax:has_first_name ?first) (?name !tax:has_last_name ?last) (?name !tax:name_of ?person) (:filter (regex (str ?first) "^A.*")))		 :kb foo :trace t :values nil :use-reasoner :none)

PREFIX tax:  PREFIX rdf:  SELECT ?last ?person WHERE { ?name rdf:type tax:Person_Name. ?name tax:has_first_name ?first. ?name tax:has_last_name ?last. ?name tax:name_of ?person. FILTER regex(str(?first),"^A.*")} Results: "Gaton"	!tax:Ann_Gaton_49_in_annapolis "Ridgely"	!tax:Abraham_Ridgely_103_in_annapolis "Wishan"	!tax:Ann_Wishan_96_in_annapolis "Townshend"	!tax:Ann_Townshend_20_in_annapolis "Harmon"	!tax:Alex_Harmon_36_in_annapolis "BROWN"	!tax:ALEXANDER_BROWN_89_in_aire "GIFT"	!tax:ADAM_GIFT_164_in_aire "FREAD"	!tax:ABRAHAM_FREAD_159_in_aire "GLUNT"	!tax:ANDREW_GLUNT_162_in_aire "STRONG"	!tax:ALEXANDER_STRONG_302_in_aire "WILSON"	!tax:ALEXANDER_WILSON_325_in_aire "GLUNT"	!tax:ADAM_GLUNT_163_in_aire "SOOK"	!tax:ABRM&JOHN_SOOK_284_in_aire "MENDENHALL"	!tax:ADAM_MENDENHALL_230_in_aire "DEWEES"	!tax:ANN_DEWEES_146_in_aire "WILES"	!tax:AGNES_WILES_80_in_aire "ISER"	!tax:ANN_ISER_44_in_aire "ISER"	!tax:ANNE_ISER_190_in_aire "LAZURE"	!tax:ABRAHAM_LAZURE_208_in_aire "HUMBERT"	!tax:ADAM_HUMBERT_42_in_aire "STILL"	!tax:ARCHIBALD_STILL_279_in_aire "STILL"	!tax:ARCHIBALD_STILL_280_in_aire "STILL"	!tax:ARCHIBALD_STILL_281_in_aire "BARNET"	!tax:ABNER_BARNET_86_in_aire "GORDON"	!tax:ALEXANDER_GORDON_33_in_aire "WORK"	!tax:ANDREW_WORK_84_in_aire "FOX"	!tax:ADAM_FOX_157_in_aire "STEPHENS"	!tax:ABEDNIGO_STEPHENS_75_in_aire

Digression on converting tab-delimited files
If you go to the RDF converters page, you find http://www.w3.org/2000/10/swap/tab2n3.py. If we apply that to the Annapolis spreadsheet (tab-delimited export) we get

[   :case "1"; :name_of_Occupant "Joseph Wiatt"; :name_of_owner_of_property "Young (Bishop of Methodist Church)"; :street_Name "Green St."; :bLD_MAT "Frame"; :tOT___of_outbuildings "1"; ].
 * 1) headings found:  12 ['Case', 'Name of Occupant', 'Name of owner of property', 'Street Name', 'BLD MAT',
 * 2) 'Addition MAT', 'TOT # of outbuildings', 'Property size (acres)', 'Property size (sq ft)',
 * 3) 'Property value', 'Kitchen BLD MAT', 'Additional notes']
 * 4)  12 headings but 11 values

[   :case "2"; :name_of_owner_of_property "John Brice"; :bLD_MAT "Brick"; :tOT___of_outbuildings "2"; :kitchen_BLD_MAT "Brick"; ] . and so on. For the Aire County file we get [   :iD "1"; :sTATE "Pennsylvania"; :cOUNTY "Bedford"; :tOWNSHIP "Dublin"; :oWNRLST "ABROSER"; :oWNRFST "MATTHIAS"; :oWNRSEX "Male"; :oCCUPANT "Owner"; :pROPVAL "105"; :aCRESNUM "1"; :dWELNUM "1"; :dWELMAT "log"; :oUTBLDNO "1"; ] . This is all correct Turtle (N3), and amenable to use in RDF-friendly tools, but not so useful for integration - you gain little over leaving the data in the original Excel spreadsheet (and lose a lot since now you can't use Excel!).
 * 1)  12 headings but 11 values
 * 1) headings found:  17 ['ID', 'STATE', 'COUNTY', 'TOWNSHIP', 'OWNRLST', 'OWNRFST',
 * 2) 'OWNRSEX', 'OCCUPANT', 'OCCLST', 'OCCFST', 'OCCSEX', 'PROPVAL', 'ACRESNUM',
 * 3) 'DWELNUM', 'DWELMAT', 'OUTBLDNO', 'KITCHMAT']
 * 4)  17 headings but 16 values