On Strengths and Limitations of Single-Vector Embeddings
Archish S, Mihir Agarwal, Ankit Garg, Neeraj Kayal, Kirankumar Shiragur

TL;DR
This paper critically examines the limitations of single-vector embeddings for retrieval, highlighting issues like domain shift and drowning effects, and compares their performance to multi-vector models.
Contribution
It demonstrates that factors beyond dimensionality affect single-vector embedding performance and shows their fundamental weaknesses compared to multi-vector approaches.
Findings
Domain shift and misalignment significantly impair single-vector retrieval.
Finetuning improves recall but causes catastrophic forgetting in single-vector models.
Single-vector models are more vulnerable to drowning effects as corpus size increases.
Abstract
Recent work (Weller et al., 2025) introduced a naturalistic dataset called LIMIT and showed empirically that a wide range of popular single-vector embedding models suffer substantial drops in retrieval quality, raising concerns about the reliability of single-vector embeddings for retrieval. Although (Weller et al., 2025) proposed limited dimensionality as the main factor contributing to this, we show that dimensionality alone cannot explain the observed failures. We observe from results in (Alon et al., 2016) that -dimensional vector embeddings suffice for top- retrieval. This result points to other drivers of poor performance. Controlling for tokenization artifacts and linguistic similarity between attributes yields only modest gains. In contrast, we find that domain shift and misalignment between embedding similarities and the task's underlying notion of relevance are major…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
