Semantic resources project/Installation/AWS

These are some notes I've kept while installing a Neurocommons Mirror on the Amazon EC2 instance. I've tried to follow Alan Bawden's manual throughout. --Tim

Users
I'm running the Virtuoso server under my personal user name, tim. That is reflected in some of the configuration file settings below, but should probably ultimately be changed.

In what follows, I'll refer to whatever user we've chosen to run the Virtuoso server under as the Virtuoso user -- in my case this is tim, so whenever you see that name in the documents below, it can be replaced with another user name consistently (Alan B.'s documentation recommends the virtuoso username and group).

Directories
I've install all the directories needed by Virtuoso under a single point in the filesystem /srv/virtuoso In our working EC2 instance, this happens be an EBS (Elastic Block Storage) device which is attached to the instance but which could (potentially) be moved from instance to instance.

Working directory
The Virtuoso installation directory, or the working directory is the directory into which all the major configuration files will be placed. This directory is also the first argument to every invocation of [[rdfherd].  For our installation, I've chosen the install subdirectory of the top-level directory.

/srv/virtuoso/install

Bundles directory
The bundles themselves should be downloaded into a bundle directory, which is also specified in the rdfherd-config.pl file below. I've chosen: /srv/virtuoso/bundles

This is the location into which the bundles are rsync'ed. I've been getting my bundles from the development location on norbert:

/raid/not_backed_up/development-bundles/

As a start, I scp'ed the contents of this directory except for the medline, transitive-properties-*</tt>, and inferred-relations</tt> packages.

Disks & Striping
The major difference between the installation as I've set it up, and Alan Bawden's original instructions, has to do with the way that the disks and striping are arranged. As I understand it, there are no separate disks on the EC2 instances; therefore, there is no striping across multiple disks.

/srv/virtuoso/stripe0

The big question here is:

Does this need to be a separate directory?

In other words, its' not clear from Alan's documentation whether this should be separate, or whether (in this one-stripe AWS world) I could just make this the same as the installation directory?

Configuration Files
There are three working configuration files which need to be inspected and (possibly) modified.

/etc/rdfherd-config.pl /srv/virtuoso/install/Config.pl  /srv/virtuoso/install/virtuoso.ini

The second two files are created by rdfherd</tt> automatically, and reside in whatever your Virtuoso Installation directory is. They will need to be modified (as described below) after their generation.

rdfherd-config.pl
The bin</tt> directory of the rdfherd installation tarball contains a configuration file: rdfherd-config.pl</tt> This file needs to be:
 * 1) copied to the /etc</tt> directory,
 * 2) made readable by the Virtuoso user (i.e. "chown tim:tim /etc/rdfherd-config.pl</tt>" in this case).
 * 3) and modified, to reflect the local setup.

I modified the following lines in my rdfherd-config.pl file.

bundle_dir => "/srv/virtuoso/bundles", stripe_dirs => ["/srv/virtuoso/stripe0"], cpu_cores => 2, install_dir => "/usr/local", file_to_string_patch => 1,

The major changes in the rdfherd-config.pl file are (a) the cpu_cores</tt> setting, which is dictated by the EC2 instance parameters, and (b) the stripe_dirs</tt> list, which will have only one entry. See the discussion of this directory, above.

After the rdfherd-config.pl</tt> file has been copied and edited, the very first rdfherd</tt> command needs to be run: rdfherd /srv/virtuoso/install configure This sets up the Config.pl</tt> and virtuoso.ini</tt> files, which are described below.

Config.pl
The Config.pl file is located in

/srv/virtuoso/install/Config.pl

and is created by the command

rdfherd /usr/virtuoso/install configure

I modified the following lines in Config.pl

http_port => 7555, sql_port => 7556, require_user => "tim", number_of_buffers => 160_000, max_dirty_buffers => 120_000,

The memory buffer arguments are 1/2 of 320000 and 240000 respectively, which are the recommended amounts for a 4 Gb machine; these values were chosen here because our initial installation is on a "c1.medium" EC2 instance, which has ~2Gb of memory.

The username is tim</tt>, which is my personal user on the EC2 instance and the user under which I run the virtuoso server. Ultimately, we should probably run this as a different user.

The following settings are also in place, lower in the Config.pl file:

striping => 1, segment_size => "5G", segment_count => 10, file_extend => 64 * 1024,

These will later be matched in the virtuoso.ini</tt> file.

virtuoso.ini
At Alan R.'s suggestion, I changed some entries in the virtuoso.ini file.

In the [Parameters] section, the following line values need to be checked:

[Parameters] ... NumberOfBuffers            = 160000 MaxDirtyBuffers           = 120000

They should match the corresponding values in the Config.pl</tt> file, above.

In the [Striping] section, I only included the following segment line:

[Striping] Segment1 = 5G, /srv/virtuoso/stripe0/install/0001.db = q0

I deleted all of the other "<tt>SegmentN</tt>" lines, before starting the server.

Security and Ports
The Config.pl settings, above, dictate that ports 7555 and 7556 should be opened for HTTP and SQL remote access. However, if you are running this Virtuoso installation on an AWS machine, most ports are shut off by default. You need to explicitly open these ports in the Amazon Web Console for your EC2 instance before you will be able to remotely access either Conductor or the Sparql endpoint.

Once you've modified the ports, the server can be started using the command:

rdfherd /srv/virtuoso/install start

and stopped using

rdfherd /srv/virtuoso/install stop

The Sparql endpoint can be reached through the address:

http://machine.address:7555/sparql

Neurocommons Bundles
Bundles can be automatically added to the running virtuoso server using the <tt>bundle_update</tt> command to rdfherd.

rdfherd /srv/virtuoso/install bundle_update <bundle_names>

They should be rsync'ed to the bundles directory first, however.

For my test installation of Neurocommons, I downloaded the following bundles and bundle-groups from norbert

addgene bfo hla ipi mahco mesh mesh-eswc06 ncbi obo omim pdb pdb.danc pdbsc sciencecommons senselab skos