Beyond Outcome Verification: Verifiable Process Reward Models for Structured Reasoning
Massimiliano Pronesti, Anya Belz, Yufang Hou

TL;DR
This paper introduces Verifiable Process Reward Models (VPRMs), a reinforcement learning framework that uses deterministic, rule-based verifiers for intermediate reasoning steps, improving the coherence, accuracy, and transparency of LLMs in structured reasoning tasks.
Contribution
VPRMs provide a novel, transparent approach to process supervision in reinforcement learning, replacing neural judges with rule-based verifiers for better reasoning trace verification.
Findings
VPRMs achieve up to 20% higher F1 scores than state-of-the-art models.
VPRMs outperform verifiable outcome rewards by 6.5% in F1 score.
VPRMs improve evidence grounding and logical coherence in reasoning.
Abstract
Recent work on reinforcement learning with verifiable rewards (RLVR) has shown that large language models (LLMs) can be substantially improved using outcome-level verification signals, such as unit tests for code or exact-match checks for mathematics. In parallel, process supervision has long been explored as a way to shape the intermediate reasoning behaviour of LLMs, but existing approaches rely on neural judges to score chain-of-thought steps, leaving them vulnerable to opacity, bias, and reward hacking. To address this gap, we introduce Verifiable Process Reward Models (VPRMs), a reinforcement-learning framework in which intermediate reasoning steps are checked by deterministic, rule-based verifiers. We apply VPRMs to risk-of-bias assessment for medical evidence synthesis, a domain where guideline-defined criteria and rule-based decision paths enable programmatic verification of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Topic Modeling
