Friday, September 9, 2011

Uploading files through OData

Introduction

OData is a new protocol beeing supported by Microsoft to make REST a standard in web api’s. This means that your data is being made available through the web so external parties and applications can work with your data without needing to use your application.

This ensures for a controlled access and fast access to the data without going through the overhead of the entire web application. Note that OData doesn’t imply that you simply throw your entire database wide open on the Internet. You can still protect yourself from unwanted access by implementing authentication systems to prevent access.

This document will cover a specific portion of the entire OData protocol, namely the uploading of files through the OData system to create attachments on entities, or create entities that represent a file on the system. Because the code in this document is taken from TenForce bvba, it speaks for itself that none of the sample code can be redistributed or used without the consent of the owners.

Creating the DataService

OData in C#.NET relies on the WCF principles to host the service. We’re not exactly dealing with a real WCF service as much of the logic under the hood operates differently than a normal WCF service. Also during all the code examples, usage has been made of the toolkit developed by Jonathan Carter (http://wcfdstoolkit.codeplex.com/)

The first step is creating a class that inherits from the ODataService class located in the toolkit. This ensures that we’re dealing with the correct conventions of the toolkit. I’m not discussing how to setup the project etc, there are enough tutorials on the Internet on how to do this, nor am I discussing how to use the toolkit in your project.

The code for our base class looks like this:

[ServiceBehavior(IncludeExceptionDetailInFaults = true, InstanceContextMode = InstanceContextMode.PerSession)] public class Api : ODataService<Context>
{
public static void InitializeService(DataServiceConfiguration config)
{
config.UseVerboseErrors = true;
config.SetEntitySetAccessRule("*", EntitySetRights.All);
config.SetServiceOperationAccessRule("*", ServiceOperationRights.All);
config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2;
Factory.SetImplementation(typeof(Api2.Implementation.Api));
}

protected override void OnStartProcessingRequest(ProcessRequestArgs args)
{
try
{
string uri = DetermineUrl();
bool authSuccess = Authenticate(args.OperationContext, uri);
args.OperationContext.ResponseHeaders.Add(@"TenForce-RAuth", authSuccess ? @"OK" : @"DENIED");

if (!authSuccess) throw new AuthenticationException(@"Invalid username and/or password");
base.OnStartProcessingRequest(args);
}
catch (Framework.TexecExceptions.TexecException te)
{
throw new AuthenticationException(string.Format("No account exists for the url '{0}'.", System.Web.HttpContext.Current.Request.Url.AbsoluteUri), te);
}
}

private static string DetermineUrl()
{
if (OperationContext.Current.IncomingMessageProperties.ContainsKey("MicrosoftDataServicesRequestUri"))
{
var uri = (OperationContext.Current.IncomingMessageProperties["MicrosoftDataServicesRequestUri"] as Uri);
if (uri != null) return uri.ToString();
throw new InvalidOperationException("Could not determine the Uri for the DataService.");
}

if (OperationContext.Current.IncomingMessageHeaders != null) return
OperationContext.Current.IncomingMessageHeaders.To.AbsoluteUri;

if (System.Web.HttpContext.Current != null) return System.Web.HttpContext.Current.Request.Url.ToString();
throw new InvalidOperationException("Could not determine the Uri for the DataService.");
}

private static bool Authenticate(DataServiceOperationContext context, string url)
{
string header = context.RequestHeaders["TenForce-Auth"];
if (string.IsNullOrEmpty(header)) return false;
header = Encoding.UTF8.GetString(Convert.FromBase64String(header));
string[] components = header.Split('|');
return (components.Length >= 2) && Authenticator.Authenticate(components[0], components[1], Authenticator.ConstructDatabaseId(url));
}
}

This class makes sure that your application can now respond to OData calls. Because we mentioned security in the introduction, we have inserted a special requirement for our class. People using our OData Api are required to submit a special header during their requests that contains a username and password that allows them to use the OData Api.

Our class has been extended with some additional functions that extract this information from the headers and verifies this information against the database.

The Attachment entity

OData requires each entity to be represented by a class. Because we’re dealing with entities that need to be downloadable for the users, we need to make a few special adjustments to the class that allows the service to provide the entity as binary data. The code for our class looks like the following:

namespace TenForce.Execution.Api2.Objects
{
using Microsoft.Data.Services.Toolkit.Providers;
using System;
using System.Data.Services.Common;

[HasStream]
[DataServiceKey("Id")]
public class Attachment : IStreamEntity
{
public int Id { get; set; }
public string LinkReason { get; set; }
public DateTime Created { get; set; }
public User Creator { get; set; }
public DateTime Modified { get; set; }
public User Modifier { get; set; }
public string Filename { get; set; }
public int Filesize { get; set; }
public string ExternalReference { get; set; }
public Item Item { get; set; }

#region IStreamEntity Implementation

public string GetContentTypeForStreaming()
{
return "multipart/mixed";
}

public Uri GetUrlForStreaming()
{
return new Uri(Factory.CreateApi().Attachments.GetAttachmentPath(this, false));
}

public string GetStreamETag()
{
return string.Empty;
}

#endregion
}
}

What’s important to note in this class:

· The [HasStream] attribute. This attribute marks the class as a streamable entity according the WCF standards. It means that the class can be streamed on request.

· IStreamEntity interface. This interface is part of the toolkit and defines the required functions that need to be implemented to represent the information of the entity as a stream. This mostly reflects to information for headers.

Basicly you implement the IStreamEntity interface and add the [HasStream] attribute to the class.

The AttachmentRepository

The repository is a class used by the toolkit for translating the incoming requests on the service into internal API calls. There is no base interface or class that needs to be implemented. The toolkit relies on the principle of “convention over configuration” and assumes that there is always a Repository class available for each entity. This is implemented by the RepositoryFor method in the toolkit, which will use reflection to find a class that matches this pattern.

The code for the RepositoryFor method looks like this:

public override object RepositoryFor(string fullTypeName)
{
string typeName = fullTypeName.Replace("[]",string.Empty).Substring(fullTypeName.LastIndexOf('.')+1);
Type repoType = Type.GetType(string.Format("TenForce.Execution.Api2.OData.{0}Repository", typeName));
if(repoType == null) throw new NotSupportedException();
return Activator.CreateInstance(repoType);
}

This code is located in the Context class that needs to be implemented. If you look back at the Api class we wrote, you’ll see that it inherits from the ODataService class. Context is the class that represents the “Context” for the service on where to retrieve all the data from for each supported entity.

When dealing with the entity framework, this would be a wrapper class that maps all the calls to the related database table. In case of the toolkit, we use the reflection provider to map all the calls to the relevant properties on the class, which in return direct the call to the correct Repository implementation.

A property for the Attachment entity looks like this on the Context class:

///

/// Gets a queryable collection containing Attachment entities for the OData WebService.
///

public IQueryable<Attachment> Attachments { get { return CreateQuery<Attachment>(); } }

The property calls the CreateQuery method of the context, which dives into the toolkit for finding out what exactly needs to be done. I’m not going to dive into the toolkit too deeply here, but basicly the CreateQuery statement analyzes the query being submitted and checks if there needs to be paging applied, filtering or other OData supported operations.

A final note, the Context class itself inherits from the ODataContext class, which can be found in the toolkit.

Going back to the Repository class itself for our Attachment entity, we need to follow several conventions from the toolkit. To make it easy for ourselves, we implemented these conventions in an abstract base class where we inherit from. The benefit here is that all logic is stored in a single class.

GetProvider()

GetProvider is a method we have implemented on every class that allows a quick call to the correct ICrud<> implementation in our case. We do this, because the ICrud implementation represent our internal API, and the set implementation depends on who’s calling it. We rely on dependency injection here to figure that out for us, and just program against the interface:

///

/// Returns the IAttachmentsProvider implementation that will handle all the requests.
///

///
protected override IAttachmentsProvider GetProvider()
{
return Factory.CreateApi().Attachments;
}

IStreamRepository implementation

Because our Repository needs to work with Streamable entities, we need to inform the OData Service that we’re dealing with this kind of entities. This is done by implementing the interface IStreamRepository. Again, this interface is present in the toolkit and defines a specific set of functions that we’re going to have to implement.

DeleteStream

This function basicly deletes the stream of the entity, and the physical source of the stream on the server. In our case this means that we actually delete the file on disk that is linked to the given Attachment entity.

///

/// Deletes all the resources of the given attachment
///

/// The entity who's resources need to be deleted.
public void DeleteStream(object entity)
{
if (entity as Attachment == null) return;
Factory.CreateApi().Attachments.DeleteFile(entity as Attachment);
}

GetWriteStream

This function performs actually the bulk of the work. The responsibility of the function is to provide a Stream that allows the service to store the data being sent by the client. Important to know here is the correct working of the toolkit.

This means having a good understanding of the workings of the OData protocol:

The initial upload contains the binary data of the file being sent.
Thus we need to create a dummy instance first and link the file to it
In a subsequent PUT/MERGE call, we need to update the created entity.

The function keeps track of this in our case, but doesn’t handle MERGE requests. We rely on a secondary PUT from the user to update the linked entity.

///

/// Returns a valid Stream implementation where the StreamProvider can write the data in.
///

/// The entity who's binary data needs to be saved.
/// Information about the current operation context.
/// A valid Stream implementation pointing to the correct location.
/// Invalid entity supplied for this Repository.
public Stream GetWriteStream(object entity, System.Data.Services.DataServiceOperationContext context)
{
if(entity as Attachment == null) throw new ArgumentException("Invalid entity supplied.");
// Handle the POST Request. The POST request means that a new file is beeing uploaded to
// create a new attachment.
// This means we need to retrieve the filename from the slugheader if present, or
// generate a dummy one.
// Handle the POST Request. The POST request means that a new file is beeing uploaded
// to create a new attachment.
// This means we need to retrieve the filename from the slugheader if present,
// or generate a dummy one.
if (operationContext.RequestMethod == "POST")
{
// Save the required properties of the Attachment entity
(entity as Attachment).Item = new Item {Id = int.Parse(operationContext.RequestHeaders["Slug"])};
(entity as Attachment).Filename = string.Format("{0}_{1}{2}{3}{4}{5}{6}{7}_newfile",
(entity as Attachment).Item.Id,
DateTime.Now.Year,
DateTime.Now.Month,
DateTime.Now.Day,
DateTime.Now.Hour,
DateTime.Now.Minute,
DateTime.Now.Second,
DateTime.Now.Millisecond);
(entity as Attachment).Created = DateTime.Now;
(entity as Attachment).Modified = DateTime.Now;

// Return the filestream to temporary store the file.
return new FileStream(GetProvider().GetAttachmentPath(entity as Attachment, true), FileMode.OpenOrCreate);
}

// Handle the PUT request.
// The Attachment should have been saved right now in the database.
// So all properties should have correct values.
return new FileStream(Factory.CreateApi().Attachments.GetAttachmentPath(entity as Attachment, false), FileMode.Create, FileAccess.ReadWrite);
}

That’s actually all you need to properly handle the upload of a file. If we run this through OData we can simply send a file in a POST request and later retrieve it by going to our URL like this: http:///Api.svc/Attachment(id)/$value which will downloads the file for us.

Thursday, August 25, 2011

link_to remote in Rails 3

If you're like me, and come from a rails 2 background, you'll probably have noticed that in Rails 3 the way of creating these nifty AJAX links is completely different then the way it used to be. Actually it's not that much different, you're just required to do alot more work, and honestly the documentation on this is piss-poor on the web. I've found a few blogs that gave some insight on what needs to be done, but they're all written for jQuery.

In this case, I'm going to show you how to create a simple link_to remote which uses prototype to call an action on of our controllers and relies on a simple rj.erb file to update a

element on our page.

The first step we need to undertake is to create a JavaScript file. We'll call it bindings.js and it should be located in your /public/javascripts folder. The content of the file should something like this:

document.observe("dom:loaded", function(){
$('our_div_id')
.observe("ajax:success", function(evt, data, status, xhr){
// The response is Javascript, so evaluate it.
eval(xhr.responseText);
// Return false to avoid jumping
return false;
})
.observe("ajax:failure", function(evt, data, status, xhr){
alert("failed");
// Insert a custom error message when something goes wrong
$('our_div_id').replace('

');
$('our_div_id').insert('

A problem occured.

');

// Return false to avoid jumping
return false;
});
});

Make sure that you replace the 'your_div_id' with the actuall id of yourdiv-tag. Note that in regards to jQuery, you do not need to append the # symbol before the id.

The general idea behind the javascript is that we observe the 'dom:loaded' event of the entire HTML document, which is triggered after loading the page, and then we bind the 'ajax:success' and 'ajax:error' functions.
Why these 2 functions? Because they're the functions defined in rails.js. These functions are called when an AJAX calls succeeds or fails. If you want to know the details, check the rails.js file

I'm not going to discuss how your action on the controller should look like, as this is nothing special. I personally do not bother with the respond_to |format| stuff, and just let rails decide which action template needs to be server. In most cases this is always correct.
The contents of your action.js.erb file should be pure Javascript. Here is the content of mine:

$('semantic').replace('

');
$('semantic').insert("

<%=h @country.abstract.value %>

");
$('semantic').insert("

<%=h @country.comment.value %>

");
$('semantic').insert("

Currency: <%=h @country.currency.value %>

");
$('semantic').insert("

Population: <%=h @country.population.value %>

");
$('semantic').insert("

Capital: <%=h @country.capital.value %>

");

Basicly this piece of Javascript updated the semantic div on my page with new content.

The last step is creating the actuall link. This is the easiest part of everything and the syntax looks like this:

= link_to @vacancy.country.name, semantic_country_url(@vacancy.country.id), :remote => true

The trick is in the ":remote => true" part, which allows rails to make it an AJAX call.

In short:

Create a Javascript file that binds the ajax callbacks to your element
Create a js.erb (or other template) to generate the javascript/output
Create a link_to :remote => true

Hope this makes it clear on how AJAX works in Rails 3

Tuesday, August 23, 2011

Ruby on Rails : Consuming a SPARQL endpoint

Right,

At work, I've been plunged into the semantic technologies such as OWL, SPARQL and RDF. The goal is to create a Ruby on Rails demonstration site that relies on semantic technologies to gather information from the web, based on specific keywords. Because we can go really far into this, I have taken the following points for demonstration purposes:

We base ourselves on the Country of our model
We collect specific information such as a generic description, captial, currency and population

Ok, with that defined, we take dbpedia as our source of information (http://dbpedia.org). DBPedia has a public SPARQL endpoint that can be used to generate SQPARL queries to retrieve information from their RDF datasets. I'm not going to discuss how all these technologies work, just how I approached the problem and solved it.

First things first, I create a class that allows me to consume queries. Because I've spent several hours working out how RDF in Ruby on Rails works (and did not found a good solution), I have opted to let the SPARQL endpoint return it's data to me in the form of JSON strings. The benefit here is that the amount of data transfered is as small as possible. This is the complete class that performs the search for me on DBPedia:

# This class represents the SearchEngine to search RDF stored, relying on semantic technologies.
# The class is fine tuned for specific searches on hardcoded repositories used for the ESCO matching
# demonstration. Special functions will be created that allow the searching of specific data used for
# the demonstration.
#
# The class can be used by creating a new instance and then call the appropriate search function that
# will search the specific RDF store and return information related to the specific query. The information
# returned will always be as a single string, which can be used to display on a website.
class SemanticSearchEngine
# This function will try to query the SPARQL endpoint of the dbpedia website and return the absolute
# URL to the RDF store for the specified country
# The function returns an RDF triplet collection containing several bits of information
# about the requested city, in the language specified.
def country_information(country, language_code)
query = "
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX prop: <http://dbpedia.org/property/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?abstract ?comment ?population ?capital ?currency
WHERE {
?country rdfs:label '#{country}'@en;
a dbo:Country;
dbo:abstract ?abstract .
FILTER langMatches( lang(?abstract), '#{language_code}')
OPTIONAL { ?country prop:populationEstimate ?population }
OPTIONAL { ?country prop:capital ?capital}
OPTIONAL { ?country rdfs:comment ?comment FILTER langMatches( lang(?comment), '#{language_code}') }
OPTIONAL { ?country prop:currency ?currency }
}"
# execute the query and retrieve the RDF data.
SemanticCountry.new(retrieve_json_data("http://dbpedia.org/sparql?query=#{CGI::escape(query)}&format=json")['results']['bindings'].first)
end

# Retrieves the information from ESCO in JSON format with the given URL
# If no data is found, an empty JSON hash is returned instead.
private
def retrieve_json_data url
JSON.parse HTTParty.get url
end
end

The most important part of the class is the actuall query that we're sending towards the SPARQLE endpoint:

PREFIX dbo:
PREFIX prop:
PREFIX foaf:
PREFIX rdfs:
SELECT ?abstract ?comment ?population ?capital ?currency
WHERE {
?country rdfs:label '#{country}'@en;
a dbo:Country;
dbo:abstract ?abstract .
FILTER langMatches( lang(?abstract), '#{language_code}')
OPTIONAL { ?country prop:populationEstimate ?population }
OPTIONAL { ?country prop:capital ?capital}
OPTIONAL { ?country rdfs:comment ?comment FILTER langMatches( lang(?comment), '#{language_code}') }
OPTIONAL { ?country prop:currency ?currency }
}

The idea behind the query is as follows:

Select the country which has label the value we specify: e.g 'United Kingdom'
Verify that the datasource we select is a Country according to DBPedia
Verify that the subject has a property called abstract, and filter the selection for the specified language
Optionally get the estimated population
Optionally get the capital of the Country
Optionally get any comments in the English language of the Country.

Normally it should be possible to specify in the HTTP header that we want to retrieve the information as a sparqle result + JSON, but I encountered some issues with this during the execution of the query, so I added to the end of the entire url $format=json to make sure that the information is returned as a JSON string.

To eliminate multiple rows, I only select the first hash beeing returned, and neglect everything else. This ofcourse leads to the potential case where data is lost, or the wrong data is selected, but as I said before, it's only a demonstration.

To easily seperate the parsing logic of the JSON hashes, I created a small seperate class that strips the array from it's values into small managable entities. This is the SemanticHash class, which looks like this:

class SemanticHash
attr_accessor :value, :language, :type
# Initializes the instance of the class using the values stored inside the Hash.
# The Hash needs to be constructed in the following way:
# - 'value'
# - 'xml:lang'
# - 'type'
# If any of the keys is missing, an argumentException will be raised.
def initialize(value = {})
self.value = value['value'] || ""
self.language = value['xml:lang'] || 'en'
self.type = value['type'] || ""
end
end

To finish up, on my views where I display the countries, I provided a new div element that would receive the information. With a link_to remote action on the view I make a call to the controller action that retrieves the information from DBPedia. The code of the action looks as follows:

country = Country.find params[:id]
engine = SemanticSearchEngine.new
@country = engine.country_information country.name, 'en'
render(:update) { |page| page.replace_html 'semantic', :partial => 'semantic/country', :layout => false}

I know this is not the cleanest way of doing this for a remote action, but I'm still struggling to understand how remote links work in Ruby on Rails 3, and this also does the trick for me at the moment. This returns Javascript that updates the div with id semantic and places all the content from the partial view inside that div.