Virtuoso performance

Some pointers to various places where performance tuning information is posted.


 * Virtuoso's RDF Performance Tuning documentation
 * Virtuoso's Performance Tuning documentation from the Server Administration section
 * Virtuoso's RDF Data Representation documentation
 * Virtuoso's SQL Optimization documentation
 * Virtuoso's isql reference for knowing how to poke around the system
 * Virtuoso's Peformance Diagnostics page
 * Orri Erling's blog post DBpedia Benchmark Revisited
 * Orri Erling's paper: Towards Web Scale RDF
 * Some notes from the Banff demo
 * Full text indexing of medline abstracts, more

A basic test of hard disk read speed on GNU/Linux is hdparm. On the Neurocommons server our db disks are /sdx1 where x is {a,b,c,d,e,f,g,h}. hdparm -t /dev/sda /dev/sda1: Timing buffered disk reads: 324 MB in  3.01 seconds = 107.58 MB/sec

A basic sequential read through of the rdf database can be accomplished with select count(*) from rdf_quad table option (index rdf_quad)

To get rid of the effect of caching on linux, by dropping them use sync; echo 3 > /proc/sys/vm/drop_caches

On would expect the time for this select to be within a reasonable factor of the raw disk speed.

To compute how much space the tables are taking, use

select ISS_KEY_TABLE,ISS_KEY_NAME,ISS_NROWS,ISS_ROW_BYTES,ISS_ROW_PAGES from DB.DBA.SYS_INDEX_SPACE_STATS where ISS_KEY_TABLE like 'DB.DBA.RDF_%'

To see what SQL your SPARQL query compiles to select sparql_to_sql_text('the sparql query');

To see what the query plan is for that explain('sparql the sparql query');

If you pass a -5 as the second argument to explain, virtuoso server will write to it's own stdout a plan annotated with a lot of cost estimate information. If you pass a -7 argument then explain will return a float value that is a cost estimate. Not this is broken in 5 < 5.07. Here is a patch.

To see how many file descriptors are open. First column is how many, second how may free, third is how many total allowed cat /proc/sys/fs/file-nr 2560   0       1604434

Where to find systat. Run it, megabytes/sec display, all partitions, every 2 seconds. Or use xosview to see something graphical. iostat -p -m -d 2 xosview -page -int -net -cpu -swap &

Orri says: For dropping a table faster in version 5, drop all the indices one by one first. This reads all in sequence. The drop table deletes all rows and does random access, deleting index entries row by row. This is quite needless but still is what it does.

A useful optimization to improve locality is to do a full backup, delete the db files, and then restore. On restore we have seen a 10% improvement on some queries.

From Orri Erling there is this [[Media:Ldmeter.sql|script]] for metering some of Virtuoso's operations. He says: Load this with isql and then start by running a single isql with the command ld_meter_run (600) for a sample every 600 seconds. This does not return but can be stopped by killing the isql that started it. The data gets written into the table ld_metric. To take an extra sample, just call ld_sample from any client session. The load rates are accurate if there is no other activity on the rdf tables during the measurement.

Sometimes there is occasion to kill an errant http request. To do so, use: txn_killall.

select top 10 * from SYS_D_STAT order by TOUCHES desc

The result of the above query will give you top 10 of most popular indexes to write. The column TOUCHES in row where KEY_TABLE='DB.DBA.RDF_QUAD' will give you the number of inserted/deleted triples. If you run the query twice with 10-second interval then the difference of TOUCHES will give you performance (triples added/deleted during this 10 seconds).

The procedure call status; will provide general statistics in lines between Database Status and Current backup timestamp.

see also One graph experiment, Questions for Openlink, Loading, Tests

Report bugs to the public mailing list or support forums