Optimizing Federated Queries Based on the Physical Design of a Data Lake
Philipp D. Rohde, Maria-Esther Vidal

TL;DR
This paper introduces heuristics for optimizing federated queries in Semantic Data Lakes by leveraging physical design information, leading to faster query processing.
Contribution
It presents a novel approach that incorporates physical design heuristics into query optimization for Semantic Data Lakes, improving efficiency.
Findings
Heuristics based on physical design significantly reduce query execution time.
Implementation on Ontario demonstrates practical improvements.
Exploiting index and normalization knowledge enhances query plan quality.
Abstract
The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Data Quality and Management · Advanced Database Systems and Queries
