Tuesday, August 23, 2011

Ruby on Rails : Consuming a SPARQL endpoint

Right,

At work, I've been plunged into the semantic technologies such as OWL, SPARQL and RDF. The goal is to create a Ruby on Rails demonstration site that relies on semantic technologies to gather information from the web, based on specific keywords. Because we can go really far into this, I have taken the following points for demonstration purposes:

  • We base ourselves on the Country of our model
  • We collect specific information such as a generic description, captial, currency and population
Ok, with that defined, we take dbpedia as our source of information (http://dbpedia.org). DBPedia has a public SPARQL endpoint that can be used to generate SQPARL queries to retrieve information from their RDF datasets. I'm not going to discuss how all these technologies work, just how I approached the problem and solved it.

First things first, I create a class that allows me to consume queries. Because I've spent several hours working out how RDF in Ruby on Rails works (and did not found a good solution), I have opted to let the SPARQL endpoint return it's data to me in the form of JSON strings. The benefit here is that the amount of data transfered is as small as possible. This is the complete class that performs the search for me on DBPedia:

# This class represents the SearchEngine to search RDF stored, relying on semantic technologies.
# The class is fine tuned for specific searches on hardcoded repositories used for the ESCO matching
# demonstration. Special functions will be created that allow the searching of specific data used for
# the demonstration.
#
# The class can be used by creating a new instance and then call the appropriate search function that
# will search the specific RDF store and return information related to the specific query. The information
# returned will always be as a single string, which can be used to display on a website.
class SemanticSearchEngine
# This function will try to query the SPARQL endpoint of the dbpedia website and return the absolute
# URL to the RDF store for the specified country
# The function returns an RDF triplet collection containing several bits of information
# about the requested city, in the language specified.
def country_information(country, language_code)
query = "
PREFIX dbo: <http://dbpedia.org/ontology/
>
PREFIX prop: <http://dbpedia.org/property/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?abstract ?comment ?population ?capital ?currency
WHERE {
?country rdfs:label '#{country}'@en;
a dbo:Country;
dbo:abstract ?abstract .
FILTER langMatches( lang(?abstract), '#{language_code}')
OPTIONAL { ?country prop:populationEstimate ?population }
OPTIONAL { ?country prop:capital ?capital}
OPTIONAL { ?country rdfs:comment ?comment FILTER langMatches( lang(?comment), '#{language_code}') }
OPTIONAL { ?country prop:currency ?currency }
}"
# execute the query and retrieve the RDF data.
SemanticCountry.new(retrieve_json_data("http://dbpedia.org/sparql?query=#{CGI::escape(query)}&format=json")['results']['bindings'].first)
end

# Retrieves the information from ESCO in JSON format with the given URL
# If no data is found, an empty JSON hash is returned instead.
private
def retrieve_json_data url
JSON.parse HTTParty.get url
end
end
The most important part of the class is the actuall query that we're sending towards the SPARQLE endpoint:

PREFIX dbo:
PREFIX prop:
PREFIX foaf:
PREFIX rdfs:
SELECT ?abstract ?comment ?population ?capital ?currency
WHERE {
?country rdfs:label '#{country}'@en;
a dbo:Country;
dbo:abstract ?abstract .
FILTER langMatches( lang(?abstract), '#{language_code}')
OPTIONAL { ?country prop:populationEstimate ?population }
OPTIONAL { ?country prop:capital ?capital}
OPTIONAL { ?country rdfs:comment ?comment FILTER langMatches( lang(?comment), '#{language_code}') }
OPTIONAL { ?country prop:currency ?currency }
}


The idea behind the query is as follows:
  • Select the country which has label the value we specify: e.g 'United Kingdom'
  • Verify that the datasource we select is a Country according to DBPedia
  • Verify that the subject has a property called abstract, and filter the selection for the specified language
  • Optionally get the estimated population
  • Optionally get the capital of the Country
  • Optionally get any comments in the English language of the Country.
Normally it should be possible to specify in the HTTP header that we want to retrieve the information as a sparqle result + JSON, but I encountered some issues with this during the execution of the query, so I added to the end of the entire url $format=json to make sure that the information is returned as a JSON string.

To eliminate multiple rows, I only select the first hash beeing returned, and neglect everything else. This ofcourse leads to the potential case where data is lost, or the wrong data is selected, but as I said before, it's only a demonstration.

To easily seperate the parsing logic of the JSON hashes, I created a small seperate class that strips the array from it's values into small managable entities. This is the SemanticHash class, which looks like this:

class SemanticHash
attr_accessor :value, :language, :type
# Initializes the instance of the class using the values stored inside the Hash.
# The Hash needs to be constructed in the following way:
# - 'value'
# - 'xml:lang'
# - 'type'
# If any of the keys is missing, an argumentException will be raised.
def initialize(value = {})
self.value = value['value'] || ""
self.language = value['xml:lang'] || 'en'
self.type = value['type'] || ""
end
end
To finish up, on my views where I display the countries, I provided a new div element that would receive the information. With a link_to remote action on the view I make a call to the controller action that retrieves the information from DBPedia. The code of the action looks as follows:

country = Country.find params[:id]
engine = SemanticSearchEngine.new
@country = engine.country_information country.name, 'en'
render(:update) { |page| page.replace_html 'semantic', :partial => 'semantic/country', :layout => false}
I know this is not the cleanest way of doing this for a remote action, but I'm still struggling to understand how remote links work in Ruby on Rails 3, and this also does the trick for me at the moment. This returns Javascript that updates the div with id semantic and places all the content from the partial view inside that div.

No comments:

Post a Comment