Multi-Vector Index Compression in Any Modality
Hanxiang Qin, Alexander Martin, Rohan Jha, Chunsheng Zuo, Reno Kriz, Benjamin Van Durme

TL;DR
This paper introduces novel index compression techniques for multi-vector retrieval across various modalities, significantly reducing storage costs while maintaining or improving retrieval performance.
Contribution
It proposes four index compression methods, including a new attention-guided clustering approach, applicable to text, images, and videos, with comprehensive evaluation across multiple datasets.
Findings
Attention-guided clustering outperforms other compression methods
Compressed indexes achieve comparable or better retrieval performance
Methods are effective across text, visual, and video modalities
Abstract
We study efficient multi-vector retrieval for late interaction in any modality. Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos, but its computation and storage costs grow linearly with document length, making it costly for image-, video-, and audio-rich corpora. To address this limitation, we explore query-agnostic methods for compressing multi-vector document representations under a constant vector budget. We introduce four approaches for index compression: sequence resizing, memory tokens, hierarchical pooling, and a novel attention-guided clustering (AGC). AGC uses an attention-guided mechanism to identify the most semantically salient regions of a document as cluster centroids and to weight token aggregation. Evaluating these methods on retrieval tasks spanning text (BEIR), visual-document (ViDoRe), and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗hltcoe/AGC_qwen2.5-vl_msrvttmodel· 2 dl· ♡ 12 dl♡ 1
- 🤗hltcoe/AGC_qwen2.5-vl_msrvtt-7bmodel· 2 dl· ♡ 12 dl♡ 1
- 🤗hltcoe/AGC_qwen3-vl_msrvttmodel· 5 dl· ♡ 15 dl♡ 1
- 🤗hltcoe/ColBERT_qwen2.5-vl_msrvttmodel
- 🤗hltcoe/MemTok_qwen2.5-vl_msrvttmodel· 2 dl2 dl
- 🤗hltcoe/SeqResize_qwen2.5-vl_msrvttmodel
- 🤗hltcoe/AGC_qwen2.5-vl_colpalimodel· 13 dl13 dl
- 🤗hltcoe/ColBERT_qwen2.5-vl_colpalimodel· 11 dl11 dl
- 🤗hltcoe/MemTok_qwen2.5-vl_colpalimodel· ♡ 1♡ 1
- 🤗hltcoe/SeqReSize_qwen2.5-vl_colpalimodel· 13 dl13 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
