Practical Top-K Document Retrieval in Reduced Space
Gonzalo Navarro, Daniel Valenzuela

TL;DR
This paper introduces new reduced-space data structures for top-k document retrieval, achieving superior space/time tradeoffs in practical experiments, addressing the high space usage of existing optimal solutions.
Contribution
The paper presents novel algorithms and data structures that significantly improve space efficiency while maintaining effective top-k retrieval performance.
Findings
New algorithms outperform existing methods in space efficiency.
Experimental results demonstrate dominance in space/time tradeoffs.
Proposed structures are practical for real-world text databases.
Abstract
Supporting top-k document retrieval queries on general text databases, that is, finding the k documents where a given pattern occurs most frequently, has become a topic of interest with practical applications. While the problem has been solved in optimal time and linear space, the actual space usage is a serious concern. In this paper we study various reduced-space structures that support top-k retrieval and propose new alternatives. Our experimental results show that our novel algorithms and data structures dominate almost all the space/time tradeoff.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Web Data Mining and Analysis · Advanced Image and Video Retrieval Techniques
