ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation
Zhao Wang, Ziliang Zhao, Zhicheng Dou

TL;DR
ProRAG introduces a process-supervised reinforcement learning framework that enhances retrieval-augmented generation by providing step-level feedback, leading to improved reasoning accuracy in complex multi-hop tasks.
Contribution
It presents a novel on-policy RL method with step-level supervision and a process reward model, addressing reward sparsity and process hallucinations in RAG.
Findings
Outperforms existing RL baselines on five reasoning benchmarks.
Effectively reduces process hallucinations and improves reasoning quality.
Enhances long-horizon task performance with fine-grained process feedback.
Abstract
Reinforcement learning (RL) has become a promising paradigm for optimizing Retrieval-Augmented Generation (RAG) in complex reasoning tasks. However, traditional outcome-based RL approaches often suffer from reward sparsity and inefficient credit assignment, as coarse-grained scalar rewards fail to identify specific erroneous steps within long-horizon trajectories. This ambiguity frequently leads to "process hallucinations", where models reach correct answers through flawed logic or redundant retrieval steps. Although recent process-aware approaches attempt to mitigate this via static preference learning or heuristic reward shaping, they often lack the on-policy exploration capabilities required to decouple step-level credit from global outcomes. To address these challenges, we propose ProRAG, a process-supervised reinforcement learning framework designed to integrate learned step-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Recommender Systems and Techniques · Reinforcement Learning in Robotics
