Beyond Quantile Methods: Improved Top-K Threshold Estimation for Traditional and Learned Sparse Indexes
Jinrui Gou, Yifan Liu, Minghao Shao, Torsten Suel

TL;DR
This paper improves top-k threshold estimation accuracy for both traditional and learned sparse indexes by enhancing quantile-based methods, leading to more precise estimates with manageable additional computational costs.
Contribution
It introduces a series of enhancements to quantile methods for better threshold estimation and demonstrates their effectiveness on learned sparse index structures.
Findings
Enhanced quantile methods outperform previous approaches in MUF metrics.
Methods significantly narrow the gap to ideal MUF of 1.0.
Effective for both traditional and learned sparse indexes.
Abstract
Top-k threshold estimation is the problem of estimating the score of the k-th highest ranking result of a search query. A good estimate can be used to speed up many common top-k query processing algorithms, and thus a number of researchers have recently studied the problem. Among the various approaches that have been proposed, quantile methods appear to give the best estimates overall at modest computational costs, followed by sampling-based methods in certain cases. In this paper, we make two main contributions. First, we study how to get even better estimates than the state of the art. Starting from quantile-based methods, we propose a series of enhancements that give improved estimates in terms of the commonly used mean under-prediction fraction (MUF). Second, we study the threshold estimation problem on recently proposed learned sparse index structures, showing that our methods also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Data Classification · Statistical Methods and Inference
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
