Project site

Boundz - a vocabulary for expressing reservations about incoming data

Abstract

Boundz is a vocabulary which defines generic constraints useful for expressing dependencies between source and target datasets in situations where a source dataset is to be merged with a target dataset and the target is particular with what data it is willing to accept. These constraints, are called bounds and are based on the graph-like triple structure of RDF graphs and draws ideas from of bisimulation from modal logic, conservative extensions of ontologies, and relational peer database exchange. Although simple and generic, bounds still have powerful and practical natural interpretations, and favorable computational properties. This vocabulary is designed to allow bounds to be specified, published, composed and re-used.

Contents

Vocabulary

Vocabulary URI
http://sws.ifi.uio.no/vocab/boundz
Vocabulary definition
http://sws.ifi.uio.no/vocab/boundz.owl
Vocabulary documentation
http://sws.ifi.uio.no/vocab/boundz.html
Boundz Library
http://sws.ifi.uio.no/vocab/boundzLibrary

Prototype implementation

A test prototype for checking bound is available: boundzer.jar.

Quick start

Run the program with the command

java -jar boundzer.jar myExchangeSchemas.rdf outputExchanges.rdf outputTime.txt

where

More details

The prototype is written in Java and uses the Jena framework and Pellet reasoner.

It takes as input an RDF file containing one or more ExchangeSchema, and computes and outputs an Exchange instance for every schema.

After the input file is read into memory, reasoning is applied to the exchange schemas to reveal possible inconsistencies and allow for a simpler parsing of the vocabulary model using, e.g., superproperties to discover the different schemas settings. For each exchange schema, the specified source and target graphs are read into memory, and saturated if this is specified. Exchanges are then computed and written to output according to settings in the schema. Bounds are checked by a simple algorithm which straight-forwardly loops through all possible triples matches to the bounds and reports if a violation is found.

Evaluation

evaluationResults

We have done a simple evaluation of the prototype giving promising results. As test data we used the Lehigh University Benchmark data generator producting source graphs of various sizes. Different combinations of graphs where used as source graphs in a exchange setting, all using the same bounds and against the same target.

The running times of the experiment are presented in the graph on the right and table below. The graph plots the size of the sources and the times of all the individual runs. The time clocked is the time to parse the exchange schema specification, compute bounds and payload and violations, not to load the sources and target into memory.

The complete evaluation script, input data and bound specification, and test exchange specimen is available for download: evaluation.tgz (70MB).

Complete experiment results
Runs, average Run, ms
Target triples Source triples ms mins 1 2 3 4 5 6 7 8 9 10
6,639,877 15,572 2023.15 0.034 1827.78 1959.33 2114.53 2231.80 2260.93 1962.55 1965.53 2064.76 2036.15 1808.15
6,639,877 107,596 3559.30 0.059 4382.54 3089.90 3311.90 3076.79 4217.75 3194.27 4647.49 3027.75 3384.30 3260.34
6,639,877 158,578 4452.81 0.074 4546.31 4373.57 4296.58 5012.39 4741.71 4220.97 4609.97 4174.81 4237.87 4313.90
6,639,877 250,602 5441.04 0.091 5671.31 5267.75 5295.15 5733.34 6028.29 5104.36 5099.94 5802.70 5174.17 5233.36
6,639,877 1,279,628 28111.75 0.469 28064.85 27069.16 27828.30 28281.49 28168.82 28944.94 25996.05 31279.10 28647.97 26836.79
6,639,877 1,368,315 30144.24 0.502 29248.17 34535.03 28429.24 30874.17 30909.17 31836.01 30224.60 28257.47 28182.77 28945.75
6,639,877 1,422,634 28615.22 0.477 27835.76 29652.17 28175.14 29399.97 27473.35 30817.01 28291.24 27454.20 29187.63 27865.74
6,639,877 1,460,339 29920.36 0.499 28782.27 31352.50 30733.95 28777.36 30855.77 31097.92 30864.75 28986.93 29099.02 28653.14
6,639,877 2,632,371 56885.04 0.948 57738.53 57552.17 56983.68 55298.79 56846.12 54952.22 57785.07 58539.25 55833.44 57321.12
6,639,877 6,577,267 170320.66 2.839 169573.93 164136.18 172136.99 169161.26 183292.08 168044.46 169983.03 170514.81 171006.17 165357.67
6,639,877 6,661,615 167908.90 2.798 165749.33 166464.04 159552.31 169553.92 174164.74 170197.48 167135.92 166227.75 165822.66 174220.84
6,639,877 6,669,291 168234.24 2.804 172510.12 170222.17 162910.84 170807.19 155599.72 180238.54 175328.95 167305.29 157181.64 170237.94
6,639,877 6,804,621 171355.42 2.856 167914.69 172875.49 173671.99 170581.84 173294.85 174411.77 175940.29 160910.60 170171.22 173781.42
6,639,877 7,841,323 191444.74 3.191 188457.03 184540.94 197559.92 194708.19 191332.70 192599.79 191330.49 191149.41 190825.99 191942.97
6,639,877 8,014,358 194236.56 3.237 201292.20 195111.67 192764.98 189748.69 187641.52 192344.22 195427.41 196339.77 193140.04 198555.06
6,639,877 13,223,310 319928.52 5.332 316298.76 312809.33 322265.68 328331.70 326522.85 319260.91 325611.32 323683.79 311088.12 313412.78

References

  1. M. G. Skjæveland and A. Stolpe. Bounds: Expressing Reservations about Incoming Data. Accepted for the Fourth International Workshop on Consuming Linked Data (COLD2013) at the International Semantic Web Conference (ISWC). 2013.
  2. M. G. Skjæveland and A. Stolpe. Bounds: Expressing Reservations about Incoming Data. Position paper for W3C's RDF Validation Workshop—Practical Assurances for Quality RDF Data. 10-11 September 2013, Cambridge, MA, USA. 2013.