WARP: An Efficient Engine for Multi-Vector Retrieval

Jan Luca Scheerer; Matei Zaharia; Christopher Potts; Gustavo Alonso; Omar Khattab

arXiv:2501.17788·cs.IR·July 8, 2025

WARP: An Efficient Engine for Multi-Vector Retrieval

Jan Luca Scheerer, Matei Zaharia, Christopher Potts, Gustavo Alonso, Omar Khattab

PDF

Open Access

TL;DR

WARP is a highly efficient retrieval engine that significantly reduces latency in multi-vector retrieval systems like XTR and ColBERTv2 by employing innovative techniques without sacrificing accuracy.

Contribution

WARP introduces three novel techniques for efficient multi-vector retrieval, enabling faster performance while maintaining high retrieval quality.

Findings

01

41x reduction in end-to-end latency compared to XTR

02

3x speedup over ColBERTv2/PLAID

03

Maintains retrieval quality despite efficiency improvements

Abstract

Multi-vector retrieval methods such as ColBERT and its recent variant, the ConteXtualized Token Retriever (XTR), offer high accuracy but face efficiency challenges at scale. To address this, we present WARP, a retrieval engine that substantially improves the efficiency of retrievers trained with the XTR objective through three key innovations: (1) WARP $_{SELECT}$ for dynamic similarity imputation; (2) implicit decompression, avoiding costly vector reconstruction during retrieval; and (3) a two-stage reduction process for efficient score aggregation. Combined with highly-optimized C++ kernels, our system reduces end-to-end latency compared to XTR's reference implementation by 41x, and achieves a 3x speedup over the ColBERTv2/PLAID engine, while preserving retrieval quality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Neural Networks and Applications · Semantic Web and Ontologies