Forward Index Compression for Learned Sparse Retrieval
Sebastian Bruch, Martino Fontana, Franco Maria Nardini, Cosimo Rulli, and Rossano Venturini

TL;DR
This paper investigates efficient compression methods for the forward index in learned sparse text retrieval, introducing DotVByte to reduce storage without sacrificing retrieval speed or accuracy.
Contribution
It presents a novel compression algorithm, DotVByte, optimized for inner product calculations, improving space efficiency in sparse retrieval systems.
Findings
StreamVByte offers the best trade-off among memory, accuracy, and latency.
DotVByte achieves significant space savings.
The proposed methods maintain retrieval efficiency on MsMarco.
Abstract
Text retrieval using learned sparse representations of queries and documents has, over the years, evolved into a highly effective approach to search. It is thanks to recent advances in approximate nearest neighbor search-with the emergence of highly efficient algorithms such as the inverted index-based Seismic and the graph-based Hnsw-that retrieval with sparse representations became viable in practice. In this work, we scrutinize the efficiency of sparse retrieval algorithms and focus particularly on the size of a data structure that is common to all algorithmic flavors and that constitutes a substantial fraction of the overall index size: the forward index. In particular, we seek compression techniques to reduce the storage footprint of the forward index without compromising search quality or inner product computation latency. In our examination with various integer compression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Information Retrieval and Search Behavior · Image Retrieval and Classification Techniques
