Supporting Arbitrary Custom Datatypes in RDF and SPARQL
Résumé
In the Resource Description Framework, literals are composed of a
UNICODE string (the lexical form), a datatype IRI, and optionally, when the
datatype IRI is rdf:langString, a language tag. Any IRI can take the place of
a datatype IRI, but the specification only defines the precise meaning of a literal
when the datatype IRI is among a predefined subset. Custom datatypes have reported
use on the Web of Data, and show some advantages in representing some
classical structures. Yet, their support by RDF processors is rare and implementation
specific. In this paper, we first present the minimal set of functions that
should be defined in order to make a custom datatype usable in query answering
and reasoning. Based on this, we discuss solutions that would enable: (i) data publishers
to publish the definition of arbitrary custom datatypes on theWeb, and (ii)
generic RDF processor or SPARQL query engine to discover custom datatypes
on-the-fly, and to perform operations on them accordingly. Finally, we detail a
concrete solution that targets arbitrarily complex custom datatypes, we overview
its implementation in Jena and ARQ, and we report the results of an experiment
on a real world DBpedia use case.