An Integration-Oriented Ontology to Govern Evolution in Big Data Ecosystems
Sergi Nadal, Oscar Romero, Alberto Abell\'o, Panos Vassiliadis and, Stijn Vansummeren

TL;DR
This paper introduces a novel ontology-based framework for managing schema evolution in Big Data ecosystems, enabling accurate data integration and querying across evolving source schemas.
Contribution
It presents the Big Data Integration ontology and algorithms for query rewriting and semi-automatic ontology adaptation to handle schema evolution.
Findings
Effective query rewriting over evolving schemas
Ontology adaptation maintains query correctness
Validated on real-world API data
Abstract
Big Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in their original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving. Thus data analysts need to adapt their analytical processes after each API release. This gets more challenging when performing an integrated or historical analysis. To cope with such complexity, in this paper, we present the Big Data Integration ontology, the core construct to govern the data integration process under schema evolution by systematically annotating it with information regarding the schema of the sources. We present a query rewriting algorithm that, using the annotated ontology, converts queries posed over the ontology to queries over the sources. To cope with syntactic evolution in the sources, we present an algorithm that semi-automatically adapts the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
