Latent Abstraction for Retrieval-Augmented Generation

Ha Lan N.T; Minh-Anh Nguyen; Dung D. Le

arXiv:2604.17866·cs.CL·May 8, 2026

Latent Abstraction for Retrieval-Augmented Generation

Ha Lan N.T, Minh-Anh Nguyen, Dung D. Le

PDF

TL;DR

LAnR introduces a unified LLM framework that encodes, retrieves, and generates within its latent space, improving retrieval efficiency and performance on QA tasks by eliminating separate retriever components.

Contribution

It proposes a novel unified approach where the LLM performs retrieval and generation jointly in latent space, removing the need for explicit query generation and separate retriever modules.

Findings

01

LAnR outperforms existing RAG methods on six QA benchmarks.

02

It reduces the number of retrieval calls, enhancing inference efficiency.

03

The model's answer token entropy signals retrieval sufficiency effectively.

Abstract

Retrieval-Augmented Generation (RAG) has become a standard approach for enhancing large language models (LLMs) with external knowledge, mitigating hallucinations, and improving factuality. However, existing systems rely on generating natural language queries at each hop and maintaining a strict architectural separation between retriever and generator, preventing them from leveraging the full representational capacity of the LLM. We propose \textbf{LAnR} (Latent Abstraction for RAG), a unified framework in which a single LLM jointly performs encoding, retrieval, and generation entirely within its own latent space. Rather than generating textual queries, LAnR produces dense retrieval vectors from the hidden states of a designated \texttt{[PRED]} token and uses them to match against encoded document representations from the same model. Furthermore, LAnR adaptively decides when sufficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.