One Size Does Not Fit All: Querying Web Polystores

Yasar Khan; Antoine Zimmermann; Alokkumar Jha; Vijay Gadepally; Mathieu d'Aquin; Ratnesh Sahay

doi:10.1109/ACCESS.2018.2888601

Article Dans Une Revue IEEE Access Année : 2019

One Size Does Not Fit All: Querying Web Polystores

(1) , (2, 3, 4) , (5) , (6) , (5) , (7)

1
2
3
4
5
6
7

Yasar Khan

Fonction : Auteur

Insight Centre for Data Analytics

Antoine Zimmermann

Fonction : Auteur
PersonId : 4097
IdHAL : antoine-zimmermann
ORCID : 0000-0003-1502-6986
IdRef : 133375676

Laboratoire Hubert Curien

École des Mines de Saint-Étienne

Institut Henri Fayol

Alokkumar Jha

Fonction : Auteur

Insight Centre for Data Analytics [Galway]

Vijay Gadepally

Fonction : Auteur

MIT Lincoln Laboratory

Mathieu d'Aquin

Fonction : Auteur
PersonId : 751213
IdHAL : mathieu-daquin
ORCID : 0000-0001-7276-4702
IdRef : 097476536

Insight Centre for Data Analytics [Galway]

Ratnesh Sahay

Fonction : Auteur

Digital Enterprise Research Institute

Résumé

Data retrieval systems are facing a paradigm shift due to the proliferation of specialized data storage engines (SQL, NoSQL, Column Stores, MapReduce, Data Stream, and Graph) supported by varied data models (CSV, JSON, RDB, RDF, and XML). One immediate consequence of this paradigm shift results into data bottleneck over the web; which means, web applications are unable to retrieve data with the intensity at which data are being generated from different facilities. Especially in the genomics and healthcare verticals, data are growing from petascale to exascale, and biomedical stakeholders are expecting seamless retrieval of these data over the web. In this paper, we argue that the bottleneck over the web can be reduced by minimizing the costly data conversion process and delegating query performance and processing loads to the specialized data storage engines over their native data models. We propose a web-based query federation mechanism—called PolyWeb—that unifies query answering over multiple native data models (CSV, RDB, and RDF). We emphasize two main challenges of query federation over native data models: 1) devise a method to select prospective data sources—with different underlying data models—that can satisfy a given query and 2) query optimization, join, and execution over different data models. We demonstrate PolyWeb on a cancer genomics use case, where it is often the case that a description of biological and chemical entities (e.g., gene, disease, drug, and pathways) spans across multiple data models and respective storage engines. In order to assess the benefits and limitations of evaluating queries over native data models, we evaluate PolyWeb with the state-of-the-art query federation engines in terms of result completeness, source selection, and overall query execution time.

Mots clés

Genomics Engines Bioinformatics Cancer Resource description framework Data models

Domaines

Modélisation et simulation

Florent Breuil : Connectez-vous pour contacter le contributeur

https://hal-emse.ccsd.cnrs.fr/emse-02008296

Soumis le : mardi 5 février 2019-16:05:44

Dernière modification le : mercredi 30 octobre 2024-19:41:23

Dates et versions

Identifiants

HAL Id : emse-02008296 , version 1
DOI : 10.1109/ACCESS.2018.2888601

Citer

Yasar Khan, Antoine Zimmermann, Alokkumar Jha, Vijay Gadepally, Mathieu d'Aquin, et al.. One Size Does Not Fit All: Querying Web Polystores. IEEE Access, 2019, 7, pp.9598 - 9617. ⟨10.1109/ACCESS.2018.2888601⟩. ⟨emse-02008296⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ST-ETIENNE EMSE IOGS CNRS PARISTECH FAYOL-ENSMSE ISCOD-ENSMSE TDS-MACS UDL ANR LABORATOIRE-HUBERT-CURIEN INSTITUT-MINES-TELECOM

82 Consultations

0 Téléchargements

One Size Does Not Fit All: Querying Web Polystores

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager