Size bounds and query plans for relational joins
Albert Atserias, Martin Grohe, D\'aniel Marx

TL;DR
This paper provides theoretical bounds on the size and evaluation plans of relational joins, linking hypergraph parameters to worst-case and average-case performance, and highlights the benefits of join-project plans.
Contribution
It characterizes the worst-case join size using fractional edge cover and demonstrates how join-project plans can be optimized based on hypergraph density.
Findings
Worst-case join size characterized by fractional edge cover.
Join-project plans can outperform non-projection plans.
Average-case plans can be projection-free with only constant factor increase.
Abstract
Relational joins are at the core of relational algebra, which in turn is the core of the standard database query language SQL. As their evaluation is expensive and very often dominated by the output size, it is an important task for database query optimisers to compute estimates on the size of joins and to find good execution plans for sequences of joins. We study these problems from a theoretical perspective, both in the worst-case model, and in an average-case model where the database is chosen according to a known probability distribution. In the former case, our first key observation is that the worst-case size of a query is characterised by the fractional edge cover number of its underlying hypergraph, a combinatorial parameter previously known to provide an upper bound. We complete the picture by proving a matching lower bound, and by showing that there exist queries for which the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Constraint Satisfaction and Optimization
