Accelerating Approximate Aggregation Queries with Expensive Predicates

Daniel Kang; John Guibas; Peter Bailis; Tatsunori Hashimoto; Yi Sun,; Matei Zaharia

arXiv:2108.06313·cs.DB·August 16, 2021

Accelerating Approximate Aggregation Queries with Expensive Predicates

Daniel Kang, John Guibas, Peter Bailis, Tatsunori Hashimoto, Yi Sun,, Matei Zaharia

PDF

1 Repo

TL;DR

This paper introduces ABae, a novel query processing algorithm that accelerates approximate aggregation queries with expensive deep neural network predicates by using stratified sampling and proxies, reducing costs significantly.

Contribution

The paper develops ABae, a new method that effectively accelerates approximate aggregation queries involving costly predicates, supporting sampling with non-satisfying records and achieving optimal convergence.

Findings

01

ABae reduces labeling costs by up to 2.3x on real datasets.

02

ABae converges at an optimal rate in stratified sampling with non-satisfying draws.

03

The method outperforms baseline approaches in experiments.

Abstract

Researchers and industry analysts are increasingly interested in computing aggregation queries over large, unstructured datasets with selective predicates that are computed using expensive deep neural networks (DNNs). As these DNNs are expensive and because many applications can tolerate approximate answers, analysts are interested in accelerating these queries via approximations. Unfortunately, standard approximate query processing techniques to accelerate such queries are not applicable because they assume the result of the predicates are available ahead of time. Furthermore, recent work using cheap approximations (i.e., proxies) do not support aggregation queries with predicates. To accelerate aggregation queries with expensive predicates, we develop and analyze a query processing algorithm that leverages proxies (ABae). ABae must account for the key challenge that it may sample…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stanford-futuredata/abae
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.