LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG

Yijia Zheng; Marcel Worring

arXiv:2605.06285·cs.CL·May 8, 2026

LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG

Yijia Zheng, Marcel Worring

PDF

TL;DR

LatentRAG introduces a continuous latent space approach for reasoning and retrieval in agentic RAG, significantly reducing inference latency while maintaining high performance on complex question answering tasks.

Contribution

It proposes a novel latent space framework that aligns reasoning and retrieval, enabling efficient multi-step question answering with reduced latency.

Findings

01

Achieves comparable accuracy to explicit methods on benchmark datasets.

02

Reduces inference latency by approximately 90%.

03

Supports end-to-end joint optimization of reasoning and retrieval.

Abstract

Single-step retrieval-augmented generation (RAG) provides an efficient way to incorporate external information for simple question answering tasks but struggles with complex questions. Agentic RAG extends this paradigm by replacing single-step retrieval with a multi-step process, in which the large language model (LLM) acts as a search agent that generates intermediate thoughts and subqueries to iteratively interact with the retrieval system. This iterative process incurs substantial latency due to the autoregressive generation of lengthy thoughts and subqueries. To address this limitation, we propose LatentRAG, a novel framework that shifts both reasoning and retrieval from discrete language space to continuous latent space. Unlike existing explicit methods that generate natural language thoughts or subqueries token-by-token, LatentRAG produces latent tokens for thoughts and subqueries…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.