Threshold Queries in Theory and in the Wild
Angela Bonifati, Stefania Dumbrava, George Fletcher, Jan, Hidders, Matthias Hofer, Wim Martens, Filip Murlak, Joshua, Shinavier, S{\l}awek Staworko, Dominik Tomaszuk

TL;DR
This paper provides a comprehensive theoretical and empirical analysis of threshold queries, revealing their practical importance and showing they can improve query evaluation algorithms significantly.
Contribution
It offers the first deep theoretical analysis of threshold query evaluation and demonstrates their practical significance in real-world data scenarios.
Findings
Threshold queries can improve asymptotic bounds of evaluation algorithms.
Threshold queries are common and useful in practical data analysis.
Real-world data shows users often need results up to a threshold, regardless of ranking.
Abstract
Threshold queries are an important class of queries that only require computing or counting answers up to a specified threshold value. To the best of our knowledge, threshold queries have been largely disregarded in the research literature, which is surprising considering how common they are in practice. In this paper, we present a deep theoretical analysis of threshold query evaluation and show that thresholds can be used to significantly improve the asymptotic bounds of state-of-the-art query evaluation algorithms. We also empirically show that threshold queries are significant in practice. In surprising contrast to conventional wisdom, we found important scenarios in real-world data sets in which users are interested in computing the results of queries up to a certain threshold, independent of a ranking function that orders the query results by importance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Stream Mining Techniques · Bayesian Modeling and Causal Inference
