TL;DR
This paper introduces GRIP, a unified retrieval-as-generation framework that integrates retrieval control into token-level decoding, enabling dynamic, end-to-end reasoning and evidence gathering within a single model.
Contribution
It proposes a novel approach where retrieval decisions are embedded within generation, allowing for flexible, multi-step inference without external retrieval controllers.
Findings
GRIP outperforms strong RAG baselines on five QA benchmarks.
It is competitive with GPT-4o while using fewer parameters.
The framework enables dynamic, on-the-fly evidence integration.
Abstract
We revisit retrieval-augmented generation (RAG) by embedding retrieval control directly into generation. Instead of treating retrieval as an external intervention, we express retrieval decisions within token-level decoding, enabling end-to-end coordination without additional controllers or classifiers. Under the paradigm of Retrieval as Generation, we propose \textbf{GRIP} (\textbf{G}eneration-guided \textbf{R}etrieval with \textbf{I}nformation \textbf{P}lanning), a unified framework in which the model regulates retrieval behavior through control-token emission. Central to GRIP is \textit{Self-Triggered Information Planning}, which allows the model to decide when to retrieve, how to reformulate queries, and when to terminate, all within a single autoregressive trajectory. This design tightly couples retrieval and reasoning and supports dynamic multi-step inference with on-the-fly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
