Towards an Optimal Space-and-Query-Time Index for Top-k Document Retrieval
Wing-Kai Hon, Rahul Shah, Sharma V. Thankachan

TL;DR
This paper introduces a space-efficient index for top-k document retrieval that balances space and query time, improving on previous methods for relevance-based search in large document collections.
Contribution
It proposes a novel index structure that reduces space usage while maintaining efficient query times for top-k document retrieval based on term-frequency.
Findings
Index size is close to compressed full text index plus additional terms.
Query time is optimized to handle large document sets efficiently.
Space reduction is achieved with minimal impact on retrieval speed.
Abstract
Let be a given set of string documents of total length , our task is to index , such that the most relevant documents for an online query pattern of length can be retrieved efficiently. We propose an index of size bits and query time for the basic relevance metric \emph{term-frequency}, where is the size (in bits) of a compressed full text index of , with time for searching a pattern of length . We further reduce the space to bits, however the query time will be , where is the alphabet size and is any constant.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Image and Video Retrieval Techniques · DNA and Biological Computing
