Generative Retrieval Overcomes Limitations of Dense Retrieval but Struggles with Identifier Ambiguity

Adrian Bracher; Svitlana Vakulenko

arXiv:2604.05764·cs.IR·April 9, 2026

Generative Retrieval Overcomes Limitations of Dense Retrieval but Struggles with Identifier Ambiguity

Adrian Bracher, Svitlana Vakulenko

PDF

TL;DR

This paper evaluates generative retrieval models, demonstrating their strengths on synthetic datasets and identifying key challenges like identifier ambiguity that limit their effectiveness.

Contribution

It provides the first evaluation of generative retrieval on the LIMIT dataset, highlighting both its advantages and the decoding issues affecting performance.

Findings

01

Generative retrieval outperforms dense and sparse methods on the LIMIT dataset.

02

Adding hard negatives significantly degrades all models' performance.

03

Identifier ambiguity causes decoding failures in generative retrieval.

Abstract

While dense retrieval models, which embed queries and documents into a shared low-dimensional space, have gained widespread popularity, they were shown to exhibit important theoretical limitations and considerably lag behind traditional sparse retrieval models in certain settings. Generative retrieval has emerged as an alternative approach to dense retrieval by using a language model to predict query-document relevance directly. In this paper, we demonstrate strengths and weaknesses of generative retrieval approaches using a simple synthetic dataset, called LIMIT, that was previously introduced to empirically demonstrate the theoretical limitations of embedding-based retrieval but was not used to evaluate generative retrieval. We close this research gap and show that generative retrieval achieves the best performance on this dataset without any additional training required (0.92 and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.