Thoughts Concerning Serving Linked Data from Apache Sling

Monday, April 2, 2012

Reviewing the JCR specification as well as some of the Apache Sling documentation, I caught a wiff of congruity between the functionality they exposed and the tenets of Linked Data as they are popularly defined in relation to the Semantic Web (listing 1). What struck me about this was that the congruity seemed inherent to JCR and Sling and was, from what I can tell, unintentional. Realization of this happy accident gave rise to the question of the ease with which Linked Data can be housed and served using Sling and JCR and the faculties built into each.

  • Use URIs to identify things
  • Use HTTP URIs so that these things can be referred to and looked up by people and user agents
  • Provide useful information about the thing when its URI is dereferenced using standard formats such as RDF/XML
  • Include links to other, related URIs in the exposed data to improve discovery of other related info on the web

listing 1 [http://en.wikipedia.org/wiki/Linked_data#Principles]

A second, and perhaps more important, question worthy of brief consideration is why do I care (and why do I think anyone else should care) whether Linked Data can be hosted and served via Sling? Linked Data can, essentially, be served from nearly any web server in multiple formats. However, most web servers would not be able to boast that they are built for this purpose. I would, and do, argue that Sling comes closer than most for a number of use cases. As such, if you are in a position akin to the consideration of architectures for a new app/site serving up Semantic … stuff ... and have not already sunk considerable time (and/or money) into another server, Sling is worth considering.

Now that I’ve exposed the bottom line I intended to present herein, the remainder stands as supporting evidence. Cobbled together from my coffee fueled mental meanderings, the enumeration below attempts to line up the tenets of Linked Data (listing 1) with the functions which Sling and JCR expose directly. Where I was not able to do this, I provide some thoughts on the most direct route to take to achieve the particular end. I do, at some point, hope to dive down a POC rabbit hole to provide some material for a future post.

Tenet 1: A fairly immediate correlation can be seen here between a URI and the Sling concept of a path. A URI is meant to identify a resource in a linked data context while a path identifies a resource in a content repository context. The mapping between URI’s and resources however is meant to be one-to-one. That is to say, unless told otherwise (such as via a sameAs relationship) two URI’s which differ in any way are assumed to point to different resources. A resource at a given path however may be referenced by any number of URI’s due to Sling’s URL decomposition process. In an application, if functionality such as that provided by selectors is used, steps would need to be taken to ensure the integrity of the URI to resource mapping. So long as this is taken into account, paths fit the identification bill so to speak.

Tenet 2: Sling, being RESTful in nature, fits this tenet well. A user or user agent can easily request a resource via HTTP mechanisms and may even, with some facility, request different views of that resource easily provided the concerns brought up in point 1 are properly addressed. Going a step beyond requesting a resource, Sling’s support of the WebDAV API opens up interesting (and dare I say exciting) possibilities when it comes to the creation, destruction, and modification of resources. Using this API, one can modify an existing resource simply by sending POST requests directly to the canonical URI of the resource. From a purist standpoint this is preferable to alternatives such as the modification of resources via an application or API which is divorced from the resource itself.

Tenet 3: Here there is little built into Sling or JCR to directly assist us. The goal can essentially be broken into two sub-goals. First, the modeling of data in such a way that it can be represented as RDF triples. Second, the actual delivery of RDF triples. I will speak about these in reverse order.

RDF can take a number of forms. The two I’m interested in here are RDFa embedded within HTML and Turtle (I find the XML format too cumbersome to be worth writing about). Sling provides a number of mechanisms to specify the desired presentation of a resource which may be leveraged for our end provided, again, that the concerns raised in item 1 are considered. The most straightforward as I see it is the ability to specify a rendering agent via modification to the URL extension. A common example is the rendering of a resource in HTML by using a .html extension vs a rendering in json using a .json extension.

So long as our JCR resources are modeled in such a way as to facilitate the exposure of their gooey innards (which we will get to in a moment), it is somewhat trivial to see how .jsp (or the like) could be written to present .html with RDFa embedded. We could also conceivably support a .rdf or a .ttl extension which would be handled by a separate jsp or servlet. This would need to be coded for but the point is that Sling provides the plumbing through which we may pour our water.

Concerning the modeling of data, we look to the JCR itself. The JCR specification sets forth the concept of Node Types which can be applied to Nodes within a Repository. One aspect of typing is the definition and enforcement of a node's structure. Types in JCR follow a fairly standard inheritance model such that a type can be a sub-type of another type (or types). There are some parallels here between Node Types and the OWL concept of Classes. Conceivably, one could go so far as to devise an algorithm by which OWL classes would be mapped into JCR Node Types and visa-versa. There may be limits to the complexity of OWL classes which could be reliably represented as a perfect isomorphism most likely does not exist between OWL Classes and Node Types. An immediate restriction which comes to mind is the fact that a JCR node may only be classified by one PrimaryType. In RDF (and thus in OWL), a resource may have any number of rdf:type declarations made about it. Additional types can be added in JCR as MixinTypes, however some decisions would need to be made concerning how to attribute classes in the case where a resource has multiple types.

Tenet 4: Implementation of this point is fairly trivial in so much as it is generally accomplished by almost every webpage via the use of A HREF tags. So long as points 1-3 have been realized, support for this goal can be achieved simply by making sure links to related resources are pervasive.

Top