On Efficient Approximate Queries over Machine Learning Models
Dujian Ding, Sihem Amer-Yahia, Laks VS Lakshmanan

TL;DR
This paper introduces a unified framework for approximate query answering over machine learning models that minimizes expensive oracle calls by combining proxies and probabilistic quality assessments, with algorithms backed by theoretical guarantees.
Contribution
The paper proposes novel algorithms for efficient approximate queries over ML models using proxies, with theoretical guarantees and empirical validation on real datasets.
Findings
Algorithms outperform state-of-the-art methods.
High-quality answers achieved with minimal oracle calls.
Provable statistical guarantees for answer quality.
Abstract
The question of answering queries over ML predictions has been gaining attention in the database community. This question is challenging because the cost of finding high quality answers corresponds to invoking an oracle such as a human expert or an expensive deep neural network model on every single item in the DB and then applying the query. We develop a novel unified framework for approximate query answering by leveraging a proxy to minimize the oracle usage of finding high quality answers for both Precision-Target (PT) and Recall-Target (RT) queries. Our framework uses a judicious combination of invoking the expensive oracle on data samples and applying the cheap proxy on the objects in the DB. It relies on two assumptions. Under the Proxy Quality assumption, proxy quality can be quantified in a probabilistic manner w.r.t. the oracle. This allows us to develop two algorithms: PQA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Quality and Management · Advanced Graph Neural Networks
