Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs

Zhangying Feng; Qianglong Chen; Ning Lu; Yongqian Li; Siqi Cheng; Shuangmu Peng; Duyu Tang; Shengcai Liu; Zhirui Zhang

arXiv:2505.11227·cs.AI·December 9, 2025

Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs

Zhangying Feng, Qianglong Chen, Ning Lu, Yongqian Li, Siqi Cheng, Shuangmu Peng, Duyu Tang, Shengcai Liu, Zhirui Zhang

PDF

Open Access

TL;DR

This paper shows that pure reinforcement learning can improve reasoning and PRM capabilities in large language models without explicit PRM training, challenging the belief that process supervision is necessary.

Contribution

It demonstrates that pure RL training enhances reasoning and PRM abilities simultaneously, and introduces Self-PRM, a self-evaluation framework that improves solution accuracy.

Findings

01

Pure RL training improves reasoning without PRMs.

02

Current PRMs underperform simple baselines.

03

Self-PRM enhances accuracy but faces challenges on difficult problems.

Abstract

The development of reasoning capabilities represents a critical frontier in large language models (LLMs) research, where reinforcement learning (RL) and process reward models (PRMs) have emerged as predominant methodological frameworks. Contrary to conventional wisdom, empirical evidence from DeepSeek-R1 demonstrates that pure RL training focused on mathematical problem-solving can progressively enhance reasoning abilities without PRM integration, challenging the perceived necessity of process supervision. In this study, we conduct a systematic investigation of the relationship between RL training and PRM capabilities. Our findings demonstrate that problem-solving proficiency and process supervision capabilities represent complementary dimensions of reasoning that co-evolve synergistically during pure RL training. In particular, current PRMs underperform simple baselines like majority…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Text Readability and Simplification