Towards Output-Optimal Uniform Sampling and Approximate Counting for Join-Project Queries
Xiao Hu, Jinchao Huang

TL;DR
This paper introduces the first asymptotically optimal algorithms for join-project query sampling and counting, significantly improving efficiency and establishing theoretical limits for these fundamental database operations.
Contribution
It presents novel rejection-based sampling and hybrid counting algorithms that are optimal for matrix, star, and chain join-project queries, filling a key gap in database query processing.
Findings
Achieved polynomial speedups over previous methods.
Established matching communication complexity lower bounds.
Proved sublinear algorithms are impossible for chain queries.
Abstract
Uniform sampling and approximate counting are fundamental primitives for modern database applications, ranging from query optimization to approximate query processing. While recent breakthroughs have established optimal sampling and counting algorithms for full join queries, a significant gap remains for join-project queries, which are ubiquitous in real-world workloads. The state-of-the-art ``propose-and-verify'' framework \cite{chen2020random} for these queries suffers from fundamental inefficiencies, often yielding prohibitive complexity when projections significantly reduce the output size. In this paper, we present the first asymptotically optimal algorithms for fundamental classes of join-project queries, including matrix, star, and chain queries. By leveraging a novel rejection-based sampling strategy and a hybrid counting reduction, we achieve polynomial speedups over the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
