Scalable and Distributed Methods for Entity Matching, Consolidation and Disambiguation over Linked Data Corpora

Abstract : With respect to large-scale, static, Linked Data corpora, in this paper we discuss scalable and distributed methods for entity consolidation (aka. smushing, entity resolution, object consolidation, etc.) to locate and process names that signify the same entity. We investigate (i) a baseline approach, which uses explicit owl: sameAs relations to perform consolidation; (ii) extended entity consolidation which additionally uses a subset of OWL 2 RL/RDF rules to derive novel owl:sameAs relations through the semantics of inverse-functional properties, functional-properties and (max-)cardinality restrictions with value one; (iii) deriving weighted concurrence measures between entities in the corpus based on shared inlinks/outlinks and attribute values using statistical analyses; (iv) disambiguating (initially) consolidated entities based on inconsistency detection using OWL 2 RL/RDF rules. Our methods are based upon distributed sorts and scans of the corpus, where we deliberately avoid the requirement for indexing all data. Throughout, we offer evaluation over a diverse Linked Data corpus consisting of 1.118 billion quadruples derived from a domain-agnostic, open crawl of 3.985 million RDF/XML Web documents, demonstrating the feasibility of our methods at that scale, and giving insights into the quality of the results for real-world data.
Type de document :
Article dans une revue
Journal of Web Semantics, Elsevier, 2012, special issue on Web-scale Semantic Information Processing, Volume 10, pp.Pages 76-110. 〈10.1016/j.websem.2011.11.002〉
Liste complète des métadonnées

https://hal-emse.ccsd.cnrs.fr/emse-01082486
Contributeur : Florent Breuil <>
Soumis le : jeudi 13 novembre 2014 - 15:41:52
Dernière modification le : mardi 22 mars 2016 - 01:16:15

Identifiants

Collections

Citation

Hogan Aidan, Antoine Zimmermann, Umbrich Jürgen, Axel Polleres. Scalable and Distributed Methods for Entity Matching, Consolidation and Disambiguation over Linked Data Corpora. Journal of Web Semantics, Elsevier, 2012, special issue on Web-scale Semantic Information Processing, Volume 10, pp.Pages 76-110. 〈10.1016/j.websem.2011.11.002〉. 〈emse-01082486〉

Partager

Métriques

Consultations de la notice

108