Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing
Xi Liang, Stavros Sintos, Zechao Shang, Sanjay Krishnan

TL;DR
This paper introduces PASS, a physical design that combines precomputed aggregates with stratified sampling to improve approximate query processing, especially for selective queries, by building a tree of partial aggregates for efficient exact and approximate answers.
Contribution
The paper proposes PASS, a novel data structure that integrates precomputed aggregates with stratified sampling, along with an algorithm for optimal data partitioning.
Findings
PASS improves accuracy of approximate query results.
The approach efficiently handles selective queries.
It offers reliable confidence intervals even with small samples.
Abstract
Sample-based approximate query processing (AQP) suffers from many pitfalls such as the inability to answer very selective queries and unreliable confidence intervals when sample sizes are small. Recent research presented an intriguing solution of combining materialized, pre-computed aggregates with sampling for accurate and more reliable AQP. We explore this solution in detail in this work and propose an AQP physical design called PASS, or Precomputation-Assisted Stratified Sampling. PASS builds a tree of partial aggregates that cover different partitions of the dataset. The leaf nodes of this tree form the strata for stratified samples. Aggregate queries whose predicates align with the partitions (or unions of partitions) are exactly answered with a depth-first search, and any partial overlaps are approximated with the stratified samples. We propose an algorithm for optimally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Data Stream Mining Techniques
