Exploiting Correlations for Expensive Predicate Evaluation
Manas Joglekar, Hector Garcia-Molina, Aditya Parameswaran, Christopher, Re

TL;DR
This paper presents techniques to efficiently evaluate selection queries with expensive UDF predicates, achieving up to 80% cost savings while maintaining user-specified accuracy constraints.
Contribution
It introduces a family of methods for low-cost query processing with UDF predicates, applicable under various prior information scenarios and for complex queries.
Findings
Achieves up to 80% cost reduction in real datasets
Maintains accuracy within user-specified constraints
Applicable to noisy or no prior probability information
Abstract
User Defined Function(UDFs) are used increasingly to augment query languages with extra, application dependent functionality. Selection queries involving UDF predicates tend to be expensive, either in terms of monetary cost or latency. In this paper, we study ways to efficiently evaluate selection queries with UDF predicates. We provide a family of techniques for processing queries at low cost while satisfying user-specified precision and recall constraints. Our techniques are applicable to a variety of scenarios including when selection probabilities of tuples are available beforehand, when this information is available but noisy, or when no such prior information is available. We also generalize our techniques to more complex queries. Finally, we test our techniques on real datasets, and show that they achieve significant savings in cost of up to , while incurring only a small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Semantic Web and Ontologies
