Exercises tutorial Semantic Days 2012

For each section there will about 15 minutes to solve exercises. There is probably more work than 15 minutes allows for, but there are exercises of different degree of difficulty, so choose the level you are comfortable with.

Table of contents:

  1. Installing software
  2. RDF
  3. SPARQL
  4. OWL
  5. D2R

1 Installing software

Your first exercise is to install the software we will be using on your local computer. We will try to have the software available on CDs or Memory sticks.

1.1 Protégé

Install latest version of Protégé 4.1. Go to Protége's download site and select the version correct for your system.

1.2 D2R Server

Install Java runtime environment.

Download D2R Server. We will go through the installation in later exercises.

2 RDF

In these exercises we will use the RDF serialisation format Turtle to write RDF.

2.1 Exercise

2.1.1 Getting started

Open a plain text editor of your own choice, e.g., notepad, textpad, gedit, and start the file with the following prefix declarations (ignore the line numbers):

1:  @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
2:  @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
3:  @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
4:  @prefix ex: <http://www.example.org#> .
5:  @prefix w: <http://sws.ifi.uio.no/ont/world.owl#> .

2.1.2 Triples

Continue by adding triples that capture the statements:

  • Norway is called "Norway", using the predicate rdfs:label,
  • Oslo is called "Oslo",
  • Oslo is the capital of Norway—use the predicate w:isCapitalOfCountry,
  • Stavanger is called "Stavanger", and
  • Stavanger is a city in Norway—use the predicate w:isCityInCountry.

Use the namespace prefixed ex: for the resources Norway, Oslo and Stavanger, e.g., ex:Norway.

2.1.3 Validate

Validate your finished RDF file using the RDF Validator and Converter. Paste the contents of your RDF file in the text area on the website and set the input format drop-down menu to "Notation 3 (or N-Triples/Turtle)" and click "Validate!". Sort out any errors in your RDF "code" that the validator reports.

2.1.4 Visualise

When your RDF validates, the website will, in addition to giving you a thumbs up, return an RDF/XML rendering of your file. Copy this RDF/XML, open the W3C's RDF validator, and paste the RDF/XML into the text area. Under Display Result Options select "Triples and Graph" and click "Parse RDF".

2.1.5 Solution

The code block below contains a solution to the RDF specification in the exercise—excluding the prefixes which should precede this block.

Each line contains a triple. The first triple is

ex:Norway      rdfs:label            "Norway" .

where ex:Norway is the subject of the triple and rdfs:label and "Norway" are the predicate and object, respectively.

 6:  ex:Norway      rdfs:label               "Norway" .
 7:  
 8:  ex:Oslo        rdfs:label               "Oslo" ;
 9:                 w:isCapitalOfCountry     ex:Norway .
10:  
11:  ex:Stavanger   rdfs:label               "Stavanger" ;
12:                 w:isCityInCountry        ex:Norway .

Semicolon is used as shorthand notation; what follows a semicolon specifies a triple with the same subject as the preceding triple without repeating the subject.

An equivalent representation of the lines 8–9, not using semicolon, would be:

 ex:Oslo        rdfs:label              "Oslo" .
 ex:Oslo        w:isCapitalOfCountry    ex:Norway .

2.2 Exercise

Add more triples expressing that

  • Stavanger is City, and Rogaland is a Region—use the predicate rdf:type and the resources w:City and w:Region,
  • Rogaland is a region in Norway, and
  • Stavanger is a city in Rogaland.

Create predicates similar to the predicates in the previous exercise, e.g., isCapitalOfCountry to capture the two last bullet points.

Make sure your extended RDF file validates.

2.2.1 Solution

13:  ex:Stavanger    rdf:type                w:City ;
14:                  w:isCityInRegion        ex:Rogaland .
15:  
16:  ex:Rogaland     rdf:type                w:Region ;
17:                  w:isRegionInCountry     ex:Norway .

2.3 Exercise

Further extend your RDF file to contain that:

Norway is a country with a population of 4985870. The head of state is "King Harald V". Norway has two local names, one in the language "Norwegian bokmål" (language code @nb): "Norge", and one the the language "Norwegian nynorsk" (@nn): "Noreg".

Again, create new predicates for the relations between Norway and the information about Norway. It is natural to use literals for the RDF representation of the statements; try also to specify the datatype or language of the literals where appropriate.

2.3.1 Solution

In Turtle a is an abbreviation for rdf:type, so the first line in the block below is equivalent to

 ex:Norway         rdf:type          w:Country ;

The datatype of the literal "4985870" is naturally an integer, this is specified by adding ^^xsd:int behind the literal. Similarly, "Kong Harald V" is a string. 1 The literals "Norge" and "Noreg" are marked with the language they are written in.

18:  ex:Norway       a                       w:Country ;
19:                  w:hasPopulation         "4985870"^^xsd:int ;
20:                  w:hasHeadOfState        "Kong Harald V"^^xsd:string ;
21:                  w:hasLocalName          "Norge"@nb , 
22:                                          "Noreg"@nn .

Semicolon is used as shorthand notation; what follows a semicolon specifies a triple with the same subject as the preceding triple without repeating the subject. Like the semicolon, the colon is shorthand notation; what follows a colon specifies a triple with the same subject and predicate as the preceding triple without repeating the subject and predicate. This means that the last line represents the triple

 ex:Norway           w:hasLocalName          "Noreg"@nn .

3 SPARQL

In these exercises we will write SPARQL queries and execute them in a SPARQL query interface located at http://sws.ifi.uio.no/snorql/world/. The dataset which is queried is an RDF representation of a traditional relational database containing facts about countries, cities, continents and so on in the world2, similar to the RDF we wrote in the previous exercise.

By using the web browser interface to the RDF representation, e.g., Stavanger, you can look at the dataset in a human friendly readable way and see, e.g., what properties the different types of resources have. We will come back to this database system in later exercises.

For each of the exercises below write a SPARQL query which returns the desired result when executed on the endpoint.

3.1 Exercise, Getting started

First, to get you started, using a web browser go to the address http://sws.ifi.uio.no/snorql/world/. In the text area on this page you should see the SPARQL query

1:  SELECT DISTINCT * WHERE {
2:    ?s ?p ?o
3:  }
4:  LIMIT 10

and press the "Go!" button.

In less than a second you should see the results of the query execution. The query asks for any 10 distinct triples from the dataset. The result I got was the following table. Note that the results you get might very well not be the same.

spo
db:District/AFG/Kabolrdfs:label"Kabol"
db:District/AFG/Qandaharrdfs:label"Qandahar"
db:District/AFG/Heratrdfs:label"Herat"
db:District/AFG/Balkhrdfs:label"Balkh"
db:District/NLD/Noord-Hollandrdfs:label"Noord-Holland"
db:District/NLD/Zuid-Hollandrdfs:label"Zuid-Holland"
db:District/NLD/Utrechtrdfs:label"Utrecht"
db:District/NLD/Noord-Brabantrdfs:label"Noord-Brabant"
db:District/NLD/Groningenrdfs:label"Groningen"
db:District/NLD/Gelderlandrdfs:label"Gelderland"

3.2 Exercise

List all continents.

Formulated more RDF-friendly this exercise would be "select everything which is of type world:Continent", or perhaps even more friendly: "select all the subjects of triples where rdf:type is the predicate and world:Continent is the object."

3.2.1 Solution

1:  SELECT ?continent
2:  WHERE {
3:    ?continent rdf:type world:Continent .
4:  }

Click to run query

3.3 Exercise

What is the name of the capital of Albania?

The predicate world:hasCapital connects a country with its capital. The identifier for the resource Albania is

 <http://sws.ifi.uio.no/d2rq/resource/Country/ALB>

3.3.1 Solution

The exercise was to list the name of the capital (and not the identifier), so we need to get the identifier for the capital of Albania, which will be bound to the variable ?hasCapital, and get the triple connecting the capital resource to its name, which we do with the predicate world:hasName.

1:  SELECT ?capital_name
2:  WHERE {
3:    <http://sws.ifi.uio.no/d2rq/resource/Country/ALB> world:hasCapital ?capital .
4:    ?capital world:hasName ?capital_name .
5:  }

Click to run query.

Note that it is not possible to use the prefix

PREFIX worlddata: <http://sws.ifi.uio.no/d2rq/resource/>

to select the resource representing Albania, i.e.,

worlddata:Country/ALB

This is because forward slashes (/) are not allowed in localnames, i.e., the part of the identifier following the prefix.

3.4 Exercise

List all the names of cities which have a population of more than 5.000.000.

The predicate connecting a city to its population is world:hasCityPopulation, and world:hasName connects it to its name.

3.4.1 Solution

The trick here is to use FILTER to restrict the output of a query.

1:  SELECT ?city_name
2:  WHERE{
3:     ?city a world:City ;
4:           world:hasCityPopulation ?pop ;
5:           world:hasName ?city_name .
6:  FILTER(?pop > 5000000)
7:  }

Click to run query.

3.5 Exercise

List all the names of Chinese cities which have a population of more than 5.000.000.

The predicate connecting a city to its country is world:isCityInCountry. The identifier for China is

 <http://sws.ifi.uio.no/d2rq/resource/Country/CHN>

3.5.1 Solution

In this query we need to combine the lessons learnt from the two previous queries.

1:  SELECT ?city_name
2:  WHERE{
3:     ?city a world:City ;
4:           world:hasCityPopulation ?pop ;
5:           world:hasName ?city_name ;
6:           world:isCityInCountry <http://sws.ifi.uio.no/d2rq/resource/Country/CHN> .
7:  FILTER(?pop > 5000000)
8:  }

Click to run query.

3.6 Exercise

List all unique government forms.

3.6.1 Solution

Use DISTINCT to only list the unique answers.

1:  SELECT DISTINCT ?government_form
2:  WHERE{
3:    ?x world:hasGovernmentForm ?government_form
4:  }

Click to run query.

3.7 Exercise

List all the countries that lie in more than one continent.

The predicate connecting a country to its continents is world:isCountryInContinent.

3.7.1 Solution

A query which solves this exercise is one that requires that the output countries are all connected to two continents, and—important!—that these continents are not the same continent. This is done in the query be requiring that ?continent1 is different from (!=) ?continent2.

1:  SELECT ?country
2:  WHERE {
3:    ?country world:isCountryInContinent ?continent1, ?continent2
4:    FILTER(?continent1 != ?continent2)
5:  }

Click to run query.

The results of the query should be empty.

3.8 Exercise

List all continents with the number of countries they contain.

Tip: Use the function count and GROUP BY.

3.8.1 Solution

1:  SELECT ?continent count(?country)
2:  WHERE {
3:    ?continent a world:Continent .
4:    ?country world:isCountryInContinent ?continent .
5:  }
6:  GROUP BY ?continent

Click to run query.

3.9 Exercise

List all countries which are not independent, i.e, have no independent year (world:hasIndependenceYear).

3.9.1 Solution

This used to be a tricky SPARQL query. Since NOT EXISTS is not part of the first SPARQL language, due to what is known as the open world assumption, this had no straight-forward solution like a SQL solution to this question would. On one side, this question does not make sense in an open world as there is an infinite number of possibilities to check before a positive answer can be definitive. On the other hand we are querying a finite set of triples and it reasonable to ask the data if there is something in the data that does not have some property, e.g., for countries which does not have a year of independence.

However, in the new SPARQL standard NOT EXISTS is part of the language and works as expected, so the solution is simply:

1:  SELECT ?country_name
2:  WHERE {
3:        ?country a world:Country ;
4:                 world:hasName ?country_name .
5:        NOT EXISTS{
6:           ?country world:hasIndependenceYear ?year .
7:        }
8:  } ORDER BY ?country_name

Click to run query.

3.10 Exercise

List all unique government forms with the country which has the maximum value of GNP for this government form. Order the output by the GNP value, the maximum on top.

3.10.1 Solution

This is similar to the NOT EXISTS solution above. There used to be no MAX in SPARQL, so in the current standard we need to select the countries where there is no other country with the same form of government and a greater GNP.

Sorting of the output is done with ORDER BY followed by the variables to be sorted. Descending sorting order is achieved with DESC.

SPARQL 1:

 1:  SELECT ?government ?country ?gnp
 2:  WHERE {
 3:       ?country world:hasGovernmentForm ?government;
 4:                world:hasGNP ?gnp .
 5:       OPTIONAL {
 6:         ?other_country world:hasGovernmentForm ?government;
 7:                        world:hasGNP ?other_gnp .        
 8:         FILTER (?gnp < ?other_gnp)
 9:       }
10:       FILTER (!bound (?other_country))
11:  }
12:  ORDER BY DESC(?gnp)

Click to run query.

SPARQL 1.1:

1:  SELECT ?government (MAX(?gnp) AS ?max)
2:  WHERE {
3:       ?country world:hasGovernmentForm ?government;
4:                world:hasGNP ?gnp .
5:  }
6:  GROUP BY ?government
7:  ORDER BY DESC(?max)

4 OWL

In these exercises we will create an ontology which defines some of the vocabulary used in the world database we queried in the previous exercise.

The first exercise is some simple modelling exercises using Protégé: making classes, subclasses, setting domain and range and so on.

4.1 Exercise

This exercise is a walk-through of how to get started with creating and editing ontologies in Protégé, showing the basic concepts.3

4.1.1 Getting started with Protégé

  1. Open Protégé and choose to "Create new OWL ontology".
  2. Set the Ontology IRI to http://sws.ifi.uio.no/ont/world.owl.
  3. Choose a location on your local computer to save your ontology, anywhere will do.
  4. Set the Ontology Format to RDF/XML.

4.1.2 Create classes, object properties and data properties

To create a class, select the "Classes" tab, select the class "Thing" and click the "Add subclass" button immediately above "Thing". Create new subclasses of Thing:

  • Country,
  • City and
  • Region.

Repeat the process for object properties and data properties:

Object properties:

  • isCityInCountry,
  • isCapitalOfCountry,
  • isCityInRegion and
  • isRegionInCountry.

Data properties:

  • hasPopulation,
  • hasHeadOfState and
  • hasLocalName.

4.1.3 Creating subclasses and subproperties

State that a capital of a country is always a city in the same country.

This can be done by making isCapitalOfCountry a subproperty of isCityInCountry. In Protégé it is done by selecting isCapitalOfCountry and adding isCityInCountry as a superproperty, or by dragging isCapitalOfCountry onto isCityInCountry in the Object property hierarchy frame.

The process of creating subclasses and subproperties for data properties is similar.

Create a new class "CityState", and make it a subclass of both Country and City.4

4.1.4 Set domain and range for property

Specify the correct domain and range for the object property isCityInCountry. The domain should be City and the range should be Country. In Protégé, select the property and add domain and range in the Description frame.

Specify also the correct domain and range for isRegionInCountry and hasHeadOfState.

4.1.5 Disjoint classes

State that a city is not a region, and vice verse. This is done by making the two classes disjoint. Disjoint classes cannot share any members. In Protégé select one of the classes and add the other class as a disjoint class.

4.1.6 Adding more restrictions

State that a city lies in exactly one country. Specify this by adding an anonymous superclass to City. Select City, and click to add a new superclass. In the box that appears, select the "Class expression editor" and write (remember to be sensitive to cases):

isCityInCountry exactly 1 Country

State also that a city lies in not more than one region. Tip: use max.

4.2 Exercise

If you have skipped the previous exercise, you can get up to speed by downloading the ontology file world.1.owl.

Download and open the dump of the RDF world database in Protégé. Import the ontology you created in the previous exercise. Add more axioms to the ontology:

  1. Create a new object property hasCapital state that it is the inverse of isCapitalInCountry.
  2. Define the class Capitol such that it contains all capitals. In the reasoner menu, select a reasoner, wait for the reasoner to calculate classifications and check if all capitals are inferred as members of Capital.
  3. Define the class Metropolis such that is contains all cities with a population of more than 1.000.000. Apply reasoning and see by the results of the classification if you have modelled correctly.
  4. State that isCountryInContinent is the property chain
    isCountryInRegion o isRegionInContinent
    
  5. Define a class DevelopingCountry as a country which has low life expectancy, e.g., 45 years, and a low GNP, e.g., 10000. Apply reasoning and see if it looks correct.
  6. Define a class DevelopedCountry to be the class of countries which are not a developing country. Again, use reasoning to check the results; are they what you expected?
  7. Set hasGNP and hasLifeExpectancy to be functional properties. Apply reasoning5, check the members of the class DevelopedCountry and explain the effects.
  8. Define AmericanCity as a city which lies on the American continent.
  9. Can you define add the necessary axiom(s) such that Singapore is inferred as a member of CityState?

4.2.1 Solution

Solutions to all, except the last, of the modelling exercises are found in http://sws.ifi.uio.no/ont/world.owl.

The question is that it is not possible to make a definition that would force Singapore as a member, except of course by adding Singapore explicitly. There is not enough information in the dataset to create such a definition.

5 D2R

5.1 Exercise: Set up a D2R server

This is a walk-trough of how to get a D2R server with the world database. The steps are tested on both Windows Vista™ Ultimate and Ubuntu Linux.

  1. Download the D2R software: http://d2rq.org/
  2. … and in the meanwhile read the http://d2rq.org/getting-started. These are the instructions we will follow.
  3. Extract the downloaded archive into a suitable location.
  4. To be able to translate the data from a relational database format to RDF, the D2R server needs a mapping. Luckily, D2R is capable of generating a mapping based on the database schema. Change into the D2R Server directory and run:
    generate-mapping -o mapping.n3 -u testinf3580 -p testinf3580 jdbc:mysql://db4free.net/testinf3580
    

    The D2R server connects to the MySQL database and creates a mapping based on the database schema. This may take a few seconds because of the network communication with the database server.

  5. The mapping generates the file mapping.n3.
  6. Start the D2R server with the command
    d2r-server mapping.n3
    
  7. Wait until you get the message
    [[[ Server started at http://localhost:2020/ ]]]
    
  8. Open http://localhost:2020 in your web browser.
  9. That's it!

Note that the D2R server you have just setup will be slower than the server at sws.ifi.uio.no. Your server needs to communicate over the Internet with the external database, while the D2R server at sws.ifi.uio.no communicates with a local database. It is quite easy to setup a MySQL database running on your local computer. A dump of the world database can download be downloaded from sws.ifi.uio.no.

5.2 Exercise

You will notice that the data in the server you have setup is different than the server running on http://sws.ifi.uio.no/d2rq/. This is because we have changed the generated mapping file slightly, extracting continents, districts and regions to own classes, changing property names and adding datatypes to literals.

Changing the mapping by using the specification D2R Mapping Language.

The current mapping file for the world database at http://sws.ifi.uio.no/d2rq/ is sws_mapping.n3, which is available for download.

Download the mapping file by clicking on the link above and restart your D2R server with this mapping.

5.3 Exercise

Download the jar file D2RQueryEngine from the download catalogue on the tutorial homepage. The program reads a D2R server dataset and an ontology and applies reasoning to the combined knowledge base. Then, a query is sent to the dataset with the inferred triples and output is written to the console/standard out. The program reads a D2R mapping file, an ontology file and a query, and is executed like this:

java -jar D2RQueryEngine mapping.n3 http://sws.ifi.uio.no/ont/world.owl query.rq

Write a query which lists all capitols which are metropolises and run the query with the D2RQueryEngine like shown above.

5.3.1 Solution

PREFIX world: <http://sws.ifi.uio.no/ont/world.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?name
WHERE {
  ?x a world:Capital, world:Metropolis ;
     rdfs:label ?name .
}

5.3.2 D2RQueryEngine

The java code for the D2RQueryEngine program is listed below. If you know a little java you will see that it is quite simple to get started with programming of semantic technologies.

Import necessary external classes. All but the two last are Jena's. The two last are to be able to connect to a D2R server and use the Pellet reasoner, respectively.

1:  import com.hp.hpl.jena.ontology.*;
2:  import com.hp.hpl.jena.query.*;
3:  import com.hp.hpl.jena.reasoner.*;
4:  import com.hp.hpl.jena.rdf.model.*;
5:  import com.hp.hpl.jena.util.*;
6:  import de.fuberlin.wiwiss.d2rq.ModelD2RQ;
7:  import org.mindswap.pellet.jena.PelletReasonerFactory;
8:  public class D2RQueryEngine {

A method which creates a Jena model, i.e., a representation of an RDF graph, by reading from file:

 9:    public Model readModel(String file) {
10:      return FileManager.get().loadModel(file);
11:    }

A method which takes a query object and a model object, queries the model with the query according to the type of the query, SELECT, CONSTRUCT or ASK, and returns the results accordingly:

12:    protected void queryModel(Query query, Model model){
13:      QueryExecution qexec = QueryExecutionFactory.create(query, model);
14:  
15:      if(query.isSelectType()){
16:        ResultSet rs = qexec.execSelect();
17:        ResultSetFormatter.out(rs, query);
18:      } 
19:      else if(query.isConstructType()){
20:        Model result = qexec.execConstruct();
21:        result.write(System.out, "TTL");
22:      } 
23:      else if(query.isAskType()){
24:        boolean result = qexec.execAsk();
25:        System.out.println(result);
26:      } 
27:      else{ System.err.println("Error!"); 
28:      }
29:      qexec.close();
30:    }

A method which runs the whole shebang:

  1. Reads input, i.e., the path to a D2R mapping file, an ontology and a query.
  2. Creates model with an attached Pellet reasoner and the ontology.
  3. Sets the type of ontology model
  4. Adds the D2R data, "automatically" causing reasoning and inferred triples to be added to the model.
  5. Queries the model.
31:    public void run(String d2r, String ont, String q){
32:      
33:      // Read input
34:      Model d2rData = new ModelD2RQ(readModel(d2r), null);                
35:      Model ontData = readModel(ont);
36:      Query query = QueryFactory.read(q);
37:  
38:      // Create ontology model
39:      Reasoner reasoner = PelletReasonerFactory.theInstance().create();
40:      InfModel infModel = ModelFactory.createInfModel(reasoner, ontData);
41:      OntModelSpec spec = new OntModelSpec(OntModelSpec.OWL_MEM);
42:      OntModel ontModel = ModelFactory.createOntologyModel(spec, infModel);
43:  
44:      // Add d2r data
45:      ontModel.add(d2rData);
46:      
47:      // Query model and write results to stdout
48:      queryModel(query, ontModel);
49:    }

main:

50:    public static void main(String[] args) {
51:      D2RQueryEngine dave = new D2RQueryEngine();
52:      dave.run(args[0], args[1], args[2]);
53:    }
54:  } //end class

Footnotes:

1 We could have chosen to represent Kong Harald V as a resource, but we chose not to.

2 The database is the same that MySQL provides to their users for experimentation, see http://dev.mysql.com/doc/world-setup/en/world-setup.html. The sample data used in the world database is Copyright Statistics Finland, http://www.stat.fi/worldinfigures.

3 An in-depth tutorial for Protégé is developed by the University of Manchester and is available online: http://owl.cs.manchester.ac.uk/tutorials/protegeowltutorial/

4 A possible member of this class could be Singapore, although in the world database the city Singapore and the country Singapore are not the same individual.

5 In my experience, the reasoner FaCT++ tackled this task better than Pellet.

Author: Martin G. Skjæveland

Date: 2012-05-08 10:04:53 CEST

HTML generated by org-mode 7.3 in emacs 23