Semantic resources project/Installation Instructions

Right now, "installation" of the semantic resources project basicallly means installing and running the Neurocommons infrastructure.

I'd also like to add some documentation here for using
 * the Semantic Resources virtual 'appliance'
 * the "webherd" system for loading some of the web interfaces to some of these bundles
 * notes for any other software or web system that results from this project.

But, for now, here's my outline of the high-level view of the Neurocommons infrastructure, including notes on how I set it up and run it to create bundles (blobs of RDF created by "bundler" software).

= Neurocommons Infrastructure =

At a high level, the Neurocommons Infrastructure consists of two kinds of components: "bundles" and "bundlers."

It also relies on two pre-installed software components: a triplestore (currently, the Virtuoso platform is the only triple store supported by the Neurocommons infrastructure) and a Neurocommons-specific piece of software called RDFHerd.

The general outline is that "bundles" are collections of triples, stored in a known location along with configuration files carrying version information, and ready for upload into a triplestore. The "bundlers" are software components that automatically download "source files" from external sources and convert them into bundles. RDFHerd is the shim software which finds bundles and loads them into a triplestore.

Bundlers
"Bundlers" are software artifacts. They are downloaded from one or more sources (currently the only location hosting Neurocommons-style bundlers is the Science Commons subversion repository), and built (if necessary) using pre-existing software development tools. After some configuration, through modification of associated text files, the bundler is then run. Running some bundlers may require additional third-party software to be installed (e.g. make, bash, java 6, python 2.6+, scheme48, abcl, wget, curl, rsync, or others).

The bundler, as it runs, may download source files from external or third-party sources. These source files are the raw data of the ultimate bundle produced by the bundler. Source files can be left in an intermediate cache after the bundler is run; most bundlers (through their use of 'make') are smart enough not to (re-)download identical source files that already exist in the cache.

More documentation on the use of bundlers in the Neurocommons distribution is present on the RDF distribution page.

Bundles
Each bundler produces, as output, a "bundle." A bundle is a collection of RDF triples, suitable for upload into a triple store.

Bundles are also accompanied by version information. Each bundler increments the version of its output bundle if the source files changed since a previous bundler run (and hence, if the bundle itself had to be regenerated from those source files). RDFHerd, or other downstream consumers of the bundles, should pay attention to version numbers and only reload the corresponding triples (into a persistent store) if the version number has increased.

The "Neurocommons distribution" is a basic set of bundlers, available from the Science Commons subversion repository. The triples which make up the bundles produced by these bundlers are documented, at least partially, on the Bundles page.

RDFHerd
The RDFHerd software finds bundles and loads them into a Virtuoso triplestore. It does this in a "smart" way, such that bundles which already exist in the triplestore are not reloaded, and with enough logging such that if the load is interrupted it can be restarted (mid-stream) without having to clean and reload the entire store. Furthermore, RDFHerd is optimized for loading triples quickly into the Virtuoso system.

RDFHerd is a collection of perl modules, documented elsewhere on this wiki.

Creating a Neurocommons Bundle
NB: Confusingly, and for historical reasons, sometimes "bundlers" are referred to as "packages." Just bear with me.

Downloading the Neurocommons Bundlers
svn co http://svn.neurocommons.org/svn/trunk/packages bundlers

This command creates a single subdirectory bundlers, under the current directory; this subdirectory contains all the bundlers available from Neurocommons. Each bundler is available as a subdirectory of bundlers, except for the common subdirectory which holds configuration files for all the other bundlers.

Configuration
cp bundlers/default.mk.doc bundlers/default.mk

Then, edit the file packages/default.mk with your favorite text editor. Read the instructions inside, which will tell you which variables need to be defined, and how to define them.

In particular, at least two variables must be defined: AUTHORITY and COMMON.

Furthermore, the easiest root to finish the configuration is to
 * 1) define BUILD_ROOT variable to point to a location under which the cached "source files" and output "bundles" can be stored for each bundler, and then
 * 2) uncomment the three variable definitions (for CACHE, WORK</tt>, and BUNDLE</tt>) that use BUILD_ROOT</tt> (but are commented out in the default version of this file).

At this point, you should probably also add packages/common/bin</tt> to your PATH.

Running a Bundler the First Time
Creating the bundle using the (now-configured) bundler is easy: simply go to the bundler directory, and execute (first) the prepare</tt> Makefile target, and (then) the bundle</tt> target.

cd bundlers/ bundle-name make prepare make bundle

Errors that result are probably from missing dependencies. For instance, several bundlers require the scheme48</tt> interpreter to be installed (and in the PATH). If you get an error from not being able to find http-validate</tt> (which does conditional GETs of web-accessible resources, saves old copies of the same resource, and does some URL munging in between), it's because you haven't put bundlers/common/bin</tt> on your PATH, etc.

Re-Running a Bundler to Update the Bundle
Just go to the bundler directory, and re-run the Makefile's bundle</tt> target.

cd bundlers/ bundle-name make bundle

Loading the Bundles
At this point, if you haven't installed Virtuoso and rdfherd, or if you have your own triple-storing solution, you could just stop. Assuming you followed the instructions in the Configuration section above, each bundle is available as a collection of .rdf, .owl, or .ttl files in the directory $(BUNDLES)/$(PACKAGE)</tt>, where $(BUNDLES)</tt> is the value of the variable with the same name you defined in the default.mk</tt> file above, and $(PACKAGE)</tt> is the name of the bundler. (Yes, I know that's confusing.)

However, if you've already gone ahead and set up Virtuoso and rdfherd, then you need to take only one more step in order to run a fully-functioning Neurocommons Mirror : run RDFHerd itself.

Running RDFHerd
Assuming RDFHerd has been correctly installed, you should be able to run the command

rdfherd <virtuoso-dir> start

to start up Virtuoso.

At this point, you're basically in Section 4 of the "Neurocommons Installation" document, so you might as well consult that for the full instructions on how to start up and shutdown Virtuoso using rdfherd, load bundles, etc.

= Links =

This Wiki
Some work on the architecture of Neurocommons, SCF, and SWAN may be necessary as part of this project.
 * We have developed Neurocommons Installation Instructions for use on an Amazon Web Services platform.
 * SCF Notes for out-of-the-box installation of Drupal and the Science Collaboration Framework.
 * Interface Requirements Notes for Neurocommons/SCF interaction.

External

 * Alan Bawden's text instructions for installing a Neurocommons have been adapted to wiki form: "Building a Neurocommons Mirror".