Proof-RM: A Scalable and Generalizable Reward Model for Math Proof

Haotong Yang; Zitong Wang; Shijia Kang; Siqi Yang; Wenkai Yu; Xu Niu; Yike Sun; Yi Hu; Zhouchen Lin; Muhan Zhang

arXiv:2602.02377·cs.CL·February 20, 2026

Proof-RM: A Scalable and Generalizable Reward Model for Math Proof

Haotong Yang, Zitong Wang, Shijia Kang, Siqi Yang, Wenkai Yu, Xu Niu, Yike Sun, Yi Hu, Zhouchen Lin, Muhan Zhang

PDF

Open Access

TL;DR

This paper introduces Proof-RM, a scalable reward model trained on diverse, high-quality proof data to reliably evaluate mathematical proofs, enhancing LLM reasoning and verification capabilities.

Contribution

We develop a scalable data pipeline and a proof-checking reward model that improves proof verification accuracy and generalization for mathematical reasoning in LLMs.

Findings

01

Proof-RM achieves high reward accuracy.

02

Model generalizes well across problem types.

03

Effective in guiding LLM proof generation.

Abstract

While Large Language Models (LLMs) have demonstrated strong math reasoning abilities through Reinforcement Learning with *Verifiable Rewards* (RLVR), many advanced mathematical problems are proof-based, with no guaranteed way to determine the authenticity of a proof by simple answer matching. To enable automatic verification, a Reward Model (RM) capable of reliably evaluating full proof processes is required. In this work, we design a *scalable* data-construction pipeline that, with minimal human effort, leverages LLMs to generate a large quantity of high-quality ``**question-proof-check**'' triplet data. By systematically varying problem sources, generation methods, and model configurations, we create diverse problem-proof pairs spanning multiple difficulty levels, linguistic styles, and error types, subsequently filtered through hierarchical human review for label alignment. Utilizing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Mathematics, Computing, and Information Processing · Natural Language Processing Techniques