Incorporating Token Importance in Multi-Vector Retrieval

Archish S; Ankit Garg; Kirankumar Shiragur; Neeraj Kayal

arXiv:2511.16106·cs.IR·November 21, 2025

Incorporating Token Importance in Multi-Vector Retrieval

Archish S, Ankit Garg, Kirankumar Shiragur, Neeraj Kayal

PDF

Open Access 1 Video

TL;DR

This paper enhances the ColBERT multi-vector retrieval model by incorporating token importance weights into the distance computation, improving retrieval performance on benchmark datasets.

Contribution

It introduces a weighted sum extension to the Chamfer distance in ColBERT, leveraging token importance to boost retrieval accuracy without retraining document representations.

Findings

01

Improved Recall@10 by 1.28% with IDF weights in zero-shot setting.

02

Achieved 3.66% higher Recall@10 after few-shot fine-tuning.

03

Method maintains efficiency by fixing multi-vector representations during training.

Abstract

ColBERT introduced a late interaction mechanism that independently encodes queries and documents using BERT, and computes similarity via fine-grained interactions over token-level vector representations. This design enables expressive matching while allowing efficient computation of scores, as the multi-vector document representations could be pre-computed offline. ColBERT models distance using a Chamfer-style function: for each query token, it selects the closest document token and sums these distances across all query tokens. In our work, we explore enhancements to the Chamfer distance function by computing a weighted sum over query token contributions, where weights reflect the token importance. Empirically, we show that this simple extension, requiring only token-weight training while keeping the multi-vector representations fixed, further enhances the expressiveness of late…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Incorporating Token Importance in Multi-Vector Retrieval· underline

Taxonomy

TopicsInformation Retrieval and Search Behavior · Topic Modeling · Advanced Image and Video Retrieval Techniques