R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning

Yuan Li; Qi Luo; Xiaonan Li; Bufan Li; Qinyuan Cheng; Bo Wang; Yining Zheng; Yuxin Wang; Zhangyue Yin; Xipeng Qiu

arXiv:2505.23794·cs.CL·October 27, 2025

R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning

Yuan Li, Qi Luo, Xiaonan Li, Bufan Li, Qinyuan Cheng, Bo Wang, Yining Zheng, Yuxin Wang, Zhangyue Yin, Xipeng Qiu

PDF

1 Repo

TL;DR

R3-RAG introduces a reinforcement learning approach for large language models to learn step-by-step reasoning and retrieval, significantly improving factual accuracy and retrieval relevance in knowledge-intensive tasks.

Contribution

The paper presents R3-RAG, a novel reinforcement learning framework enabling LLMs to iteratively reason and retrieve external knowledge, surpassing previous prompt-based methods.

Findings

01

R3-RAG outperforms baseline models in accuracy.

02

It effectively transfers to different retrievers.

03

The method enhances step-by-step reasoning and retrieval relevance.

Abstract

Retrieval-Augmented Generation (RAG) integrates external knowledge with Large Language Models (LLMs) to enhance factual correctness and mitigate hallucination. However, dense retrievers often become the bottleneck of RAG systems due to their limited parameters compared to LLMs and their inability to perform step-by-step reasoning. While prompt-based iterative RAG attempts to address these limitations, it is constrained by human-designed workflows. To address these limitations, we propose $R3-RAG$ , which uses $R$ einforcement learning to make the LLM learn how to $R$ eason and $R$ etrieve step by step, thus retrieving comprehensive external knowledge and leading to correct answers. R3-RAG is divided into two stages. We first use cold start to make the model learn the manner of iteratively interleaving reasoning and retrieval. Then we use reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuan-li-fnlp/r3-rag
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Byte Pair Encoding · Dense Connections · Softmax · Layer Normalization · Dropout · BERT · BART