PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary

Jiarui Yao; Ruida Wang; Tong Zhang

arXiv:2601.10201·cs.LG·January 16, 2026

PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary

Jiarui Yao, Ruida Wang, Tong Zhang

PDF

Open Access

TL;DR

PRL introduces a theoretically grounded method for fine-grained process supervision in reinforcement learning, significantly enhancing LLMs' reasoning capabilities and expanding their reasoning boundaries.

Contribution

It proposes Process Reward Learning (PRL), a novel framework that decomposes RL objectives into intermediate steps with rigorous process rewards, improving reasoning in LLMs.

Findings

01

PRL improves LLM reasoning performance measured by average @ n.

02

PRL broadens reasoning boundaries as shown by pass @ n metrics.

03

Extensive experiments confirm PRL's effectiveness and generalizability.

Abstract

Improving the reasoning abilities of Large Language Models (LLMs) has been a continuous topic recently. But most relevant works are based on outcome rewards at the trajectory level, missing fine-grained supervision during the reasoning process. Other existing training frameworks that try to combine process signals together to optimize LLMs also rely heavily on tedious additional steps like MCTS, training a separate reward model, etc., doing harm to the training efficiency. Moreover, the intuition behind the process signals design lacks rigorous theoretical support, leaving the understanding of the optimization mechanism opaque. In this paper, we propose Process Reward Learning (PRL), which decomposes the entropy regularized reinforcement learning objective into intermediate steps, with rigorous process rewards that could be assigned to models accordingly. Starting from theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks