Approximate Top-k Retrieval from Hidden Relations
Antti Ukkonen

TL;DR
This paper introduces a cost-effective approximate top-k retrieval method from hidden relations using regression models and prior information, demonstrated to outperform existing algorithms in a Wikipedia search context.
Contribution
It presents a novel algorithm that leverages regression models and prior data to efficiently approximate top-k queries from hidden relations.
Findings
The proposed method reduces query costs significantly.
It maintains high accuracy in top-k retrieval.
Outperforms baseline algorithms in experiments.
Abstract
We consider the evaluation of approximate top-k queries from relations with a-priori unknown values. Such relations can arise for example in the context of expensive predicates, or cloud-based data sources. The task is to find an approximate top-k set that is close to the exact one while keeping the total processing cost low. The cost of a query is the sum of the costs of the entries that are read from the hidden relation. A novel aspect of this work is that we consider prior information about the values in the hidden matrix. We propose an algorithm that uses regression models at query time to assess whether a row of the matrix can enter the top-k set given that only a subset of its values are known. The regression models are trained with existing data that follows the same distribution as the relation subjected to the query. To evaluate the algorithm and to compare it with a method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Stream Mining Techniques · Machine Learning and Algorithms
