A Replicability Study of XTR
Rohan Jha, Reno Kriz, Benjamin Van Durme

TL;DR
This study replicates and extends the evaluation of the XTR retrieval algorithm, revealing its impact on retrieval efficiency and providing practical guidance for its deployment in modern retrieval systems.
Contribution
It replicates the original XTR algorithm and training modifications, extends evaluation to new training methods and engines, and clarifies XTR's practical utility.
Findings
XTR's effectiveness over ColBERT was not replicated under controlled conditions.
XTR training produces more discriminative centroid scores, improving IVF-based retrieval.
XTR training benefits are applicable beyond low-k regimes, aiding efficient retrieval.
Abstract
The XTR (conteXtual Token Retrieval) algorithm is a modification to ColBERT retrieval that avoids the costly step of fully gathering and reranking the candidates' embeddings by imputing their missing similarity scores from the initial token retrieval step. The original work proposes a modified training objective as necessary for effective XTR retrieval, arguing that standard ColBERT token scoring is unsuitable for imputation. In this paper, we replicate both the XTR retrieval algorithm and its modified training objective, and extend the evaluation to knowledge-distillation (KD) training and efficient retrieval engines (PLAID and WARP). We confirm the token-level matching characteristics claimed in the original work, but fail to replicate XTR's overall effectiveness advantage over ColBERT under a controlled comparison. We further show that XTR's training modification has a concrete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
