Accelerating Aggregation Queries on Unstructured Streams of Data

Matthew Russo; Tatsunori Hashimoto; Daniel Kang; Yi Sun; Matei Zaharia

arXiv:2308.09157·cs.DB·August 21, 2023

Accelerating Aggregation Queries on Unstructured Streams of Data

Matthew Russo, Tatsunori Hashimoto, Daniel Kang, Yi Sun, Matei Zaharia

PDF

1 Repo

TL;DR

InQuest is a system that accelerates aggregation queries over unstructured data streams by using approximation models and sampling, providing real-time results with statistical guarantees and reducing reliance on expensive high-precision models.

Contribution

It introduces InQuest, a novel streaming system that efficiently answers aggregation queries on multi-modal data streams with theoretical error guarantees.

Findings

01

Achieves same RMSE as baselines with 5x fewer oracle calls

02

Provides up to 1.9x lower RMSE than batch algorithms at fixed oracle budget

03

Demonstrates effectiveness on six real-world datasets

Abstract

Analysts and scientists are interested in querying streams of video, audio, and text to extract quantitative insights. For example, an urban planner may wish to measure congestion by querying the live feed from a traffic camera. Prior work has used deep neural networks (DNNs) to answer such queries in the batch setting. However, much of this work is not suited for the streaming setting because it requires access to the entire dataset before a query can be submitted or is specific to video. Thus, to the best of our knowledge, no prior work addresses the problem of efficiently answering queries over multiple modalities of streams. In this work we propose InQuest, a system for accelerating aggregation queries on unstructured streams of data with statistical guarantees on query accuracy. InQuest leverages inexpensive approximation models ("proxies") and sampling techniques to limit the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stanford-futuredata/inquest
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.