Understanding the hardness of approximate query processing with joins
Tianyu Liu, Chi Wang

TL;DR
This paper investigates the fundamental limits of approximate query processing involving joins, establishing lower bounds on the information needed for accurate approximations and comparing these bounds with existing sampling methods.
Contribution
It provides the first information-theoretic lower bounds for AQP with joins, extending prior work and clarifying the inherent difficulty of approximate cardinality estimation.
Findings
Lower bounds are linear in the size of the largest table for most join queries.
Bernoulli sampling matches the lower bounds for COUNT queries over multiple tables.
Accurate approximation of join queries requires substantial information, especially when results are not guaranteed to be large.
Abstract
We study the hardness of Approximate Query Processing (AQP) of various types of queries involving joins over multiple tables of possibly different sizes. In the case where the query result is a single value (e.g., COUNT, SUM, and COUNT(DISTINCT)), we prove worst-case information-theoretic lower bounds for AQP problems that are given parameters and , and return estimated results within a factor of 1+ of the true results with error probability at most . In particular, the lower bounds for cardinality estimation over joins under various settings are contained in our results. Informally, our results show that for various database queries with joins, unless restricted to the set of queries whose results are always guaranteed to be above a very large threshold, the amount of information an AQP algorithm needs for returning an accurate approximation is at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Algorithms and Data Compression
