Retrieval from Within: An Intrinsic Capability of Attention-Based Models
Elad Hoffer, Yochai Blau, Edan Kinderman, Ron Banner, Daniel Soudry, Boris Ginsburg

TL;DR
This paper introduces INTRA, a novel attention-based framework that enables models to retrieve information internally from their own representations, improving question-answering performance without external retrieval modules.
Contribution
INTRA demonstrates that attention-based models inherently possess a retrieval capability that can be harnessed, unifying retrieval and generation in a single model.
Findings
INTRA outperforms traditional retrieval pipelines on evidence recall.
INTRA improves end-to-end answer quality on question-answering benchmarks.
The framework reuses precomputed encoder states to enhance efficiency.
Abstract
Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generator mismatch typical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on both evidence recall and end-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
