Retrieval from Within: An Intrinsic Capability of Attention-Based Models

Elad Hoffer; Yochai Blau; Edan Kinderman; Ron Banner; Daniel Soudry; Boris Ginsburg

arXiv:2605.05806·cs.LG·May 11, 2026

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

Elad Hoffer, Yochai Blau, Edan Kinderman, Ron Banner, Daniel Soudry, Boris Ginsburg

PDF

TL;DR

This paper introduces INTRA, a novel attention-based framework that enables models to retrieve information internally from their own representations, improving question-answering performance without external retrieval modules.

Contribution

INTRA demonstrates that attention-based models inherently possess a retrieval capability that can be harnessed, unifying retrieval and generation in a single model.

Findings

01

INTRA outperforms traditional retrieval pipelines on evidence recall.

02

INTRA improves end-to-end answer quality on question-answering benchmarks.

03

The framework reuses precomputed encoder states to enhance efficiency.

Abstract

Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generator mismatch typical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on both evidence recall and end-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.