Sidorenko-Inspired Pessimistic Estimation
Yu-Ting Lin, Hsin-Po Wang

TL;DR
This paper extends a framework for upper bounding join sizes in databases by generalizing graph structures from stars to bi-stars and caterpillars, improving estimation accuracy.
Contribution
It introduces caterpillar-based bounds inspired by Sidorenko's conjecture, enhancing previous star and bi-star bounds for join size estimation.
Findings
Caterpillar bounds overestimate join size by about m^{3/5}
Bi-star bounds overestimate by about m^{3/4}
Simulations show high R-squared (>0.98) for regression models
Abstract
Recently, Abo Khamis et al. showed how to upper bound the size of a join of multiple tables, a problem essential to query optimization in database theory. They unified earlier works by the following information-theoretical framework. 1. Let be a row selected from the join uniformly at random. 2. The size of the join is now . 3. To upper bound , break it into several , such as , , and , using Shannon-type inequalities. 4. Upper bound local entropies using statistics of the tables being joined. The statistics Abo Khamis et al. considered are the counts of graph homomorphisms from stars to the tables. In a follow-up work, we generalized stars to bi-stars. In this paper, we generalize bi-stars to caterpillars, an even larger class of graphs inspired by Sidorenko's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
