Codementor Events

Semantic Web 101: SPARQL Query using Endpoints

Published Aug 29, 2016Last updated Jan 18, 2017
Semantic Web 101: SPARQL Query using Endpoints

Semantic Web Background

The main purpose of the Semantic Web is to connect the data—not just the documents or high-level links between applications.
The Semantic Web abstracts away the document and application layers involved in the exchange of information.
It connects facts, so that rather than linking to a particular document or application, you can instead refer to a specific piece of information.
If that information is updated, you can automatically take advantage of the newly up-to-date information.

To achieve and create Linked Data, technologies should be available in a standard format (RDF) to make either conversion or on-the-fly access to existing databases (relational, XML, HTML, etc). It is also important to be able to set up query (SPARQL) endpoints to access that data more conveniently.

What is RDF?

Resource Description Framework or RDF is:

  • The data modeling language for the Semantic Web
  • RDF is a directed, labeled graph data format for representing information

The basic unit of information (fact) is represented as: subject, predicate, object.
A fact is known as a Triple, and together, this forms a “GRAPH”, which connects pieces of data. You can therefore think of RDF as a bunch of nodes (the dots) attached to each other by edges (the lines) where both the nodes and edges have labels.

RDF Triple Example

There are three kinds of nodes in RDF Graphs:

  • Resource Nodes (IRI’s) which represent anything that can have things said about it through an IRI (Internationalized Resource Identifier) within an RDF graph.
  • Literal Nodes which are used for values such as Strings, Numbers, or Dates. The term literal is a fancy word for value.
  • Blank Nodes which is a resource without a URI.

RDF Example Graph

What is SPARQL?

SPARQL (pronounced as "Sparkle") Protocol and RDF Query Language is:

  • The query language of the Semantic Web
  • SPARQL can be used to express queries across diverse data sources
  • SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions

Just as relational databases or XML need specific query languages (SQL and XQuery, respectively), the Semantic Web (typically represented using RDF as a data format) needs its own RDF-specific query language and facilities. This is provided by the SPARQL query language and the accompanying protocols.

Technically, SPARQL queries are based on (triple) patterns. RDF can be seen as a set of relationships among resources (i.e., RDF triples); SPARQL queries provide one or more patterns of such relationships. These triple patterns are similar to RDF triples, except that one or more of the constituent resource references are variables. A SPARQL engine would return the resources for all triples that match these patterns.

Quick Notes

  1. The term "Semantic Web" refers to W3C's vision of the web of linked data.

  2. Graph Databases and other non-relational databases are on the rise as reported by "Forrester Research, Enterprise DBMS Report".

  3. It is recommended that you read the SPARQL Recommendation after or before this tutorial to help you have a better idea of example queries.

Creating the Java Code to Query Endpoints

We will create a simple Java application which queries an endpoint using Apache Jena—a Java framework for building Semantic Web and Linked Data applications. In this example, I use the libraries from Apache Jena version 3.1 found inside the lib directory of the binary download as shown below (Copy this folder into your working directory).

Terminal Output for lib folder

We will need the libraries from the lib folder in Apache Jena and copy it to a folder (in my case I named the folder SPARQL101). Next, we will then create a file called log4j.properties which will contain the following information.

log4j.rootLogger=INFO, stdlog

log4j.appender.stdlog=org.apache.log4j.ConsoleAppender
log4j.appender.stdlog.target=System.err
log4j.appender.stdlog.layout=org.apache.log4j.PatternLayout
log4j.appender.stdlog.layout.ConversionPattern=%d{HH:mm:ss} %-5p %-20c{1} :: %m%n

## Execution logging
log4j.logger.org.apache.jena.arq.info=INFO
log4j.logger.org.apache.jena.arq.exec=INFO

## TDB loader
log4j.logger.org.apache.jena.tdb.loader=INFO
## TDB syslog.
log4j.logger.TDB=INFO

## Everything else in Jena
log4j.logger.org.apache.jena=WARN
log4j.logger.org.openjena=WARN
log4j.logger.org.openjena.riot=INFO

This file will prevent you from getting the "log4j:WARN No appenders could be found" warning, which gives headaches to newcomers when dealing with applications that use this logger.

And finally, we will create our Java code as shown below.

import org.apache.jena.query.*;
import org.apache.jena.sparql.engine.http.QueryEngineHTTP;

import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Iterator;

/**
 * Created by Isai B. Cicourel
 */
public class QueryExample {


    /**
     * Query an Endpoint using the given SPARQl query
     * @param szQuery
     * @param szEndpoint
     * @throws Exception
     */
    public void queryEndpoint(String szQuery, String szEndpoint)
    throws Exception
    {
        // Create a Query with the given String
        Query query = QueryFactory.create(szQuery);

        // Create the Execution Factory using the given Endpoint
        QueryExecution qexec = QueryExecutionFactory.sparqlService(
                szEndpoint, query);

        // Set Timeout
        ((QueryEngineHTTP)qexec).addParam("timeout", "10000");


        // Execute Query
        int iCount = 0;
        ResultSet rs = qexec.execSelect();
        while (rs.hasNext()) {
            // Get Result
            QuerySolution qs = rs.next();

            // Get Variable Names
            Iterator<String> itVars = qs.varNames();

            // Count
            iCount++;
            System.out.println("Result " + iCount + ": ");

            // Display Result
            while (itVars.hasNext()) {
                String szVar = itVars.next().toString();
                String szVal = qs.get(szVar).toString();
                
                System.out.println("[" + szVar + "]: " + szVal);
            }
        }
    } // End of Method: queryEndpoint()


    public static void main(String[] args) throws IOException {
        // SPARQL Query
        String szQuery = "select * where {?Subject ?Predicate ?Object} LIMIT 1";

        // Arguments
        if (args != null && args.length == 1) {
            szQuery = new String(
                    Files.readAllBytes(Paths.get(args[0])),
                    Charset.defaultCharset());
        }

        // DBPedia Endpoint
        String szEndpoint = "http://dbpedia.org/sparql";

        // Query DBPedia
        try {
            QueryExample q = new QueryExample();
            q.queryEndpoint(szQuery, szEndpoint);
        }
        catch (Exception ex) {
            System.err.println(ex);
        }
    }
}

Once we have the libraries, the Java Code, and the log4j configuration file, our working folder should look like the one below.

File System Explorer

At this point, we are ready to compile and run the code. To do so, we use the Java compiler and Java executable as shown below.

Compile Java

In this example code, we query the endpoint offered by DBPedia to retrieve the first Triple (Fact) from it. At this moment, the result is not yet meaningful since you are probably new to SPARQL.

Windows Users:

The Classpath separator is a semicolon (😉 and not a colon (😃 like in Unix-like systems. To compile, you need to replace the colon in the classpath as following:

javac -cp .;"lib/*" QueryExample.java 

And to run:

java -cp .;"lib/*" QueryExample 

SPARQL Query Structure

A SPARQL query comprises of the following, in this order:

  • Prefix declarations, for abbreviating URIs
  • Dataset definition, stating what RDF graph(s) are being queried
  • A result clause, identifying what information to return from the query
  • The query pattern, specifying what to query for in the underlying dataset
  • Query modifiers, slicing, ordering, and otherwise rearranging query results
# prefix declarations
PREFIX foo: <http://example.com/resources/>
...
# dataset definition
FROM ...
# result clause
SELECT ...
# query pattern
WHERE {
    ...
}
# query modifiers
ORDER BY ...

Real Life Examples with Markiplier

For those of you who likes Youtube entertainment as myself, I like the "Let's Play" videos from Markiplier, which I consider funny and entertaining.

Markiplier Twitter

Picture from Markiplier's Twitter

Now how about if we discover some information about Markiplier using DBPedia and SPARQL. Let's start with the most basic, like all available information, while we explain the SPARQL query.

# Example Query 1
prefix dbpo: <http://dbpedia.org/ontology/> 
prefix dbpr: <http://dbpedia.org/resource/>

select distinct ?Predicate ?Object where {
  ?Subject ?Predicate ?Object  
  filter(?Subject = dbpr:Markiplier)
}

Prefix: In this query, we use prefixes which are a way of simplifying our query, i.e. instead of writing the full-blown URI <http://dbpedia.org/resource/Markiplier>, we define the prefix dbpr to represent <http://dbpedia.org/resource/>. This will allow us to write dbpr:Markiplier instead. Prefixes are useful to have cleaner and shorter queries that are easier to read.

Variables: To define the pattern that we will search, we use variables that are specified by the quotation mark at the beginning of the variable name ("?" is not part of the variable name), i.e. ?Subject which represents the subject we are looking for.

Filter: Filters are a powerful tool to determine exactly what we are searching for. In this case, we specify that we want all triples (Facts) where the subject equals Markiplier.

To run this query, create a text file called Query1.txt with the example query and run it specifying the location as an argument (as shown below).

Example Query 1

Once executed, you will get the following results.

Example Query 1 Results

If you look through the results, you will find information about him, like his Gold Play button or alias. But if we want exact information to be displayed, how about if we look for all values, which are literals (Remember literal is a fancy word for value, in this case, the concrete information we might find useful).

# Example Query 2
prefix dbpo: <http://dbpedia.org/ontology/> 
prefix dbpr: <http://dbpedia.org/resource/>

select distinct ?Predicate ?Object where {
  ?Subject ?Predicate ?Object  
  filter(?Subject = dbpr:Markiplier && isLiteral(?Object))
}

Again, we create a new text file named Query2.txt and execute our code.

Example Query 2

Now you can see more concrete information, like his birthday, birth name, or alternative names. But how about if we just want his abstract and it should be in English (since his information is also available in Spanish and Arabic). To do so, we will add a language match to specify "EN", which stands for English, and add the abstract IRI as the Predicate.

# Example Query 3
prefix dbpo: <http://dbpedia.org/ontology/> 
prefix dbpr: <http://dbpedia.org/resource/>

select ?Predicate ?Object where {
  ?Subject dbpo:abstract ?Object  
  filter(?Subject = dbpr:Markiplier && langMatches( lang(?Object), "EN" ))
}

By creating a Query3.txt and executing it, we will get what we are looking for. Markipliers abstract, which we got directly from the Semantic Web.

Example Query 3

Troubleshooting

Many factors affect the querying of data, remember we are using the Web. I listed some common issues below:

  • Behind a Proxy: If you are behind a firewall and know the Proxy, you can add the following parameter when executing Java (just change the values to the ones from your Corporate Network or School, since this is just an illustrative example):
java -cp .:"lib/*" \
 -Dhttp.proxyHost=proxy.example.com \
 -Dhttp.proxyPort=80 \
 -Dhttp.noProxyHost="127.0.0.1" \
 QueryExample
  • Minor Major Version: Jena API needs Java 8 or later to run, make sure to have it configured in your os classpath.

  • No Class Definition Found: There are two possible issues; you don't have the lib folder with all the needed Jars, or you didn't specify the classpath with the following parameter:

java -cp .:"lib/*" QueryExample

Wrapping Things Up

Although there are now hundreds of SPARQL endpoints delivering facts to anyone querying them, the technology is still at an early stage.
SPARQL is the query language of the Semantic Web, and even though it can be used with private datasets, one should still use federated queries, which involve retrieving data from one or more endpoints, to use fully its capabilities.

The intention of SPARQL endpoints are:

  • Give other people and organizations access to your data in a very flexible way
  • Eventually, realize the potential of federated SPARQL whereby several SPARQL Endpoints are combined to allow complex queries to run across some datasets
  • They are open for use by anyone

Other useful applications

Now let's talk about what we can do with this newly learned information. Let us say we want to build a plugin that gets information related to the video we are watching on Youtube. The plugin detects the protagonist of the video and uses SPARQL endpoints (like DBPedia) to query information about the personality that can be processed and displayed in a sidebar to the user.

By using SPARQL, you should be able to create such applications to display information to the user without having to store all the information yourself. Give it a try, query about your favorite Youtuber and see what you can retrieve.

Discover and read more posts from Isai B. Cicourel
get started
post commentsBe the first to share your opinion
Moniruzzaman
7 years ago

Thank you very much for sharing this topic with code. I have been searching for a week specific this one. It’s helps me a lot. All I find in the web are not doing queries using SPARQL directly to DBPedia, and that’s not meet what I want.

Show more replies