Enabling Data Dependency-based Query Optimization
Daniel Lindner, Daniel Ritter, and Felix Naumann

TL;DR
This paper introduces an automated system that identifies and validates data dependencies beyond primary and foreign keys to optimize query performance, demonstrating significant speedups in analytical databases.
Contribution
It presents a novel system that automatically discovers, validates, and utilizes data dependencies for query optimization without manual intervention or SQL rewrites.
Findings
Data dependencies beyond PKs and FKs significantly improve query performance.
The system achieves geometric mean speedups of 35% (TPC-DS) and 29% (JOB).
Query latencies are reduced by more than 90% in some cases.
Abstract
Primary key (PK) and foreign key (FK) constraints are widely used for query optimization. Knowledge about additional data dependencies, such as order dependencies, enables further substantial performance improvements. However, such dependencies are not maintained by database systems or are even unknown to the user. Identifying and validating relevant dependencies automatically and efficiently remains an unsolved problem. This paper presents a system that (i) recognizes dependency candidates for optimization, (ii) efficiently validates their applicability, and (iii) optimizes query plans using valid dependencies. First, we demonstrate the performance impact of optimization techniques using data dependencies additional to PKs and FKs. Using rewritten SQL queries, we empirically show that data dependencies improve performance for a wide range of analytical database systems and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
