ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation

Zhao Wang; Ziliang Zhao; Zhicheng Dou

arXiv:2601.21912·cs.AI·January 30, 2026

ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation

Zhao Wang, Ziliang Zhao, Zhicheng Dou

PDF

Open Access 2 Models

TL;DR

ProRAG introduces a process-supervised reinforcement learning framework that enhances retrieval-augmented generation by providing step-level feedback, leading to improved reasoning accuracy in complex multi-hop tasks.

Contribution

It presents a novel on-policy RL method with step-level supervision and a process reward model, addressing reward sparsity and process hallucinations in RAG.

Findings

01

Outperforms existing RL baselines on five reasoning benchmarks.

02

Effectively reduces process hallucinations and improves reasoning quality.

03

Enhances long-horizon task performance with fine-grained process feedback.

Abstract

Reinforcement learning (RL) has become a promising paradigm for optimizing Retrieval-Augmented Generation (RAG) in complex reasoning tasks. However, traditional outcome-based RL approaches often suffer from reward sparsity and inefficient credit assignment, as coarse-grained scalar rewards fail to identify specific erroneous steps within long-horizon trajectories. This ambiguity frequently leads to "process hallucinations", where models reach correct answers through flawed logic or redundant retrieval steps. Although recent process-aware approaches attempt to mitigate this via static preference learning or heuristic reward shaping, they often lack the on-policy exploration capabilities required to decouple step-level credit from global outcomes. To address these challenges, we propose ProRAG, a process-supervised reinforcement learning framework designed to integrate learned step-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Recommender Systems and Techniques · Reinforcement Learning in Robotics