A SPARQL extension for generating RDF from heterogeneous formats
Résumé
RDF aims at being the universal abstract data model for structured
data on the Web. While there is effort to convert data in RDF, the vast majority
of data available on the Web does not conform to RDF. Indeed, exposing data
in RDF, either natively or through wrappers, can be very costly. Furthermore,
in the emerging Web of Things, resource constraints of devices prevent from
processing RDF graphs. Hence one cannot expect that all the data on the Web
be available as RDF anytime soon. Several tools can generate RDF from non-
RDF data, and transformation or mapping languages have been designed to offer
more flexible solutions (GRDDL, XSPARQL, R2RML, RML, CSVW, etc.). In
this paper, we introduce a new language, SPARQL-Generate, that generates RDF
from: (i) a RDF Dataset, and (ii) a set of documents in arbitrary formats. As
SPARQL-Generate is designed as an extension of SPARQL 1.1, it can provably:
(i) be implemented on top on any existing SPARQL engine, and (ii) leverage the
SPARQL extension mechanism to deal with an open set of formats. Furthermore,
we show evidence that (iii) it can be easily learned by knowledge engineers that
know SPARQL 1.1, and (iv) our first naive open source implementation performs
better than the reference implementation of RML for big transformations.