TL;DR
LEMUR introduces a framework that transforms multi-vector similarity search into a single-vector search problem in latent space, significantly improving speed while maintaining retrieval quality.
Contribution
LEMUR presents a novel two-step reduction approach that enables efficient multi-vector retrieval using existing single-vector search indexes.
Findings
LEMUR is an order of magnitude faster than previous multi-vector search methods.
The framework maintains high retrieval quality comparable to multi-vector approaches.
Code implementation is publicly available at the provided GitHub URL.
Abstract
Multi-vector representations generated by late interaction models, such as ColBERT, enable superior retrieval quality compared to single-vector representations in information retrieval applications. In multi-vector retrieval systems, both queries and documents are encoded using one embedding per token, and similarity between queries and documents is measured by the MaxSim similarity measure. However, the improved quality of multi-vector retrieval comes at the expense of significantly increased search latency. In this work, we introduce LEMUR, a simple yet efficient framework for multi-vector similarity search. LEMUR consists of two consecutive problem reductions: First, we formulate multi-vector similarity search as a supervised learning problem that can be solved using a one-hidden-layer neural network. Second, we reduce inference under this model to single-vector similarity search in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
