ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts,, Matei Zaharia

TL;DR
ColBERTv2 enhances neural information retrieval by combining residual compression and denoised supervision, achieving state-of-the-art results with significantly reduced model size and improved effectiveness.
Contribution
It introduces a novel residual compression and supervision strategy to improve late interaction IR models' quality and space efficiency.
Findings
Achieves state-of-the-art retrieval performance.
Reduces model space footprint by 6-10 times.
Outperforms previous late interaction models across benchmarks.
Abstract
Neural information retrieval (IR) has greatly advanced search and other knowledge-intensive language tasks. While many neural IR methods encode queries and documents into single-vector representations, late interaction models produce multi-vector representations at the granularity of each token and decompose relevance modeling into scalable token-level computations. This decomposition has been shown to make late interaction more effective, but it inflates the space footprint of these models by an order of magnitude. In this work, we introduce ColBERTv2, a retriever that couples an aggressive residual compression mechanism with a denoised supervision strategy to simultaneously improve the quality and space footprint of late interaction. We evaluate ColBERTv2 across a wide range of benchmarks, establishing state-of-the-art quality within and outside the training domain while reducing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗colbert-ir/colbertv2.0model· 13.7M dl· ♡ 31713.7M dl♡ 317
- 🤗Crystalcareai/Colbertv2model· 95 dl95 dl
- 🤗jinaai/jina-colbert-v1-enmodel· 267 dl· ♡ 100267 dl♡ 100
- 🤗LinWeizheDragon/ColBERT-v2model· 3 dl· ♡ 33 dl♡ 3
- 🤗bclavie/JaColBERTv2model· 20k dl· ♡ 1620k dl♡ 16
- 🤗lightonai/colbertv2.0model· 879 dl· ♡ 4879 dl♡ 4
- 🤗RobinAkan1/colbert-v1-tripclickmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Information Retrieval and Search Behavior
