Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling
Benjamin Clavi\'e, Antoine Chaffin, Griffin Adams

TL;DR
This paper presents a clustering-based token pooling method that significantly reduces the storage requirements of multi-vector retrieval systems like ColBERT, with minimal impact on retrieval performance, enabling more practical deployment.
Contribution
A simple, architecture-agnostic token pooling technique that halves memory usage in multi-vector retrieval without degrading accuracy.
Findings
Reduces index size by 50% with negligible performance loss
Further reductions of 66%-75% with less than 5% degradation
No changes needed to existing models or query processing
Abstract
Over the last few years, multi-vector retrieval methods, spearheaded by ColBERT, have become an increasingly popular approach to Neural IR. By storing representations at the token level rather than at the document level, these methods have demonstrated very strong retrieval performance, especially in out-of-domain settings. However, the storage and memory requirements necessary to store the large number of associated vectors remain an important drawback, hindering practical adoption. In this paper, we introduce a simple clustering-based token pooling approach to aggressively reduce the number of vectors that need to be stored. This method can reduce the space & memory footprint of ColBERT indexes by 50% with virtually no retrieval performance degradation. This method also allows for further reductions, reducing the vector count by 66%-to-75% , with degradation remaining below 5% on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Machine Learning and Algorithms · Machine Learning and Data Classification
