Optimal Top-k Document Retrieval
Gonzalo Navarro, Yakov Nekrich

TL;DR
This paper introduces a linear-space data structure for optimal top-k document retrieval that supports various relevance measures, dynamic updates, and range-restricted queries, with efficient query and update times.
Contribution
It presents a novel RAM-optimal suffix tree search method and extends top-k retrieval to dynamic and range-restricted scenarios with improved space and time complexities.
Findings
Achieves optimal query time in the RAM model for static retrieval.
Supports dynamic document insertion and deletion with efficient update times.
Provides solutions for range-restricted top-k queries with linear space.
Abstract
Let be a collection of documents, which are strings over an alphabet of size , of total length . We describe a data structure that uses linear space and and reports most relevant documents that contain a query pattern , which is a string of length , in time , which is optimal in the RAM model in the general case where , and involves a novel RAM-optimal suffix tree search. Our construction supports an ample set of important relevance measures... [clip] When , we show how to reduce the space of the data structure from to bits... [clip] We also consider the dynamic scenario, where documents can be inserted and deleted from the collection. We obtain linear space and query time ,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Advanced Data Storage Technologies
