SPLATE: Sparse Late Interaction Retrieval
Thibault Formal, St\'ephane Clinchant, Herv\'e D\'ejean and, Carlos Lassance

TL;DR
SPLATE introduces a lightweight adaptation of ColBERTv2 that enables efficient sparse retrieval for candidate generation, maintaining high effectiveness while significantly improving speed and CPU compatibility.
Contribution
It presents SPLATE, a novel method that maps frozen token embeddings to a sparse space, allowing traditional sparse retrieval techniques within late interaction pipelines.
Findings
Achieves the same effectiveness as PLAID ColBERTv2 in re-ranking 50 documents.
Runs in under 10ms on CPU environments.
Enables efficient candidate generation with sparse retrieval techniques.
Abstract
The late interaction paradigm introduced with ColBERT stands out in the neural Information Retrieval space, offering a compelling effectiveness-efficiency trade-off across many benchmarks. Efficient late interaction retrieval is based on an optimized multi-step strategy, where an approximate search first identifies a set of candidate documents to re-rank exactly. In this work, we introduce SPLATE, a simple and lightweight adaptation of the ColBERTv2 model which learns an ``MLM adapter'', mapping its frozen token embeddings to a sparse vocabulary space with a partially learned SPLADE module. This allows us to perform the candidate generation step in late interaction pipelines with traditional sparse retrieval techniques, making it particularly appealing for running ColBERT in CPU environments. Our SPLATE ColBERTv2 pipeline achieves the same effectiveness as the PLAID ColBERTv2 engine by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsSparse Evolutionary Training
