Instance Optimal Join Size Estimation
Mahmoud Abo-Khamis, Sungjin Im, Benjamin Moseley, Kirk Pruhs, Alireza, Samadian

TL;DR
This paper introduces an instance optimal algorithm for estimating the size of relational join results efficiently, regardless of join size, and also provides a method for uniform sampling of join rows.
Contribution
It presents the first instance optimal join size estimation algorithm that works for all instances, removing the previous dependency on join size.
Findings
Algorithm achieves instance optimality for all join sizes.
Provides a method for uniform random sampling of join rows.
Reduces estimation time independent of join size.
Abstract
We consider the problem of efficiently estimating the size of the inner join of a collection of preprocessed relational tables from the perspective of instance optimality analysis. The run time of instance optimal algorithms is comparable to the minimum time needed to verify the correctness of a solution. Previously instance optimal algorithms were only known when the size of the join was small (as one component of their run time that was linear in the join size). We give an instance optimal algorithm for estimating the join size for all instances, including when the join size is large, by removing the dependency on the join size. As a byproduct, we show how to sample rows from the join uniformly at random in a comparable amount of time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Analytical Chemistry and Chromatography
