Rapid Approximate Aggregation with Distribution-Sensitive Interval   Guarantees

Stephen Macke; Maryam Aliakbarpour; Ilias Diakonikolas; Aditya; Parameswaran; Ronitt Rubinfeld

arXiv:2008.03891·cs.DB·August 11, 2020

Rapid Approximate Aggregation with Distribution-Sensitive Interval Guarantees

Stephen Macke, Maryam Aliakbarpour, Ilias Diakonikolas, Aditya, Parameswaran, Ronitt Rubinfeld

PDF

TL;DR

This paper introduces a new confidence interval method for approximate data aggregation that is both correct and tighter, reducing sample size and accelerating query processing significantly.

Contribution

It develops a novel CI technique that addresses issues of pessimistic mass allocation and phantom outlier sensitivity, improving accuracy and efficiency over traditional methods.

Findings

01

Achieves up to 124x speedup over traditional AQP with guarantees.

02

Reduces sample requirements for the same CI width.

03

Provides more reliable and tighter confidence intervals.

Abstract

Aggregating data is fundamental to data analytics, data exploration, and OLAP. Approximate query processing (AQP) techniques are often used to accelerate computation of aggregates using samples, for which confidence intervals (CIs) are widely used to quantify the associated error. CIs used in practice fall into two categories: techniques that are tight but not correct, i.e., they yield tight intervals but only offer asymptotic guarantees, making them unreliable, or techniques that are correct but not tight, i.e., they offer rigorous guarantees, but are overly conservative, leading to confidence intervals that are too loose to be useful. In this paper, we develop a CI technique that is both correct and tighter than traditional approaches. Starting from conservative CIs, we identify two issues they often face: pessimistic mass allocation (PMA) and phantom outlier sensitivity (PHOS). By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.