Verifiable Process Rewards for Agentic Reasoning

Huining Yuan; Zelai Xu; Huaijie Wang; Xiangmin Yi; Jiaxuan Gao; Xiao-Ping Zhang; Yu Wang; Chao Yu; Yi Wu

arXiv:2605.10325·cs.AI·May 12, 2026

Verifiable Process Rewards for Agentic Reasoning

Huining Yuan, Zelai Xu, Huaijie Wang, Xiangmin Yi, Jiaxuan Gao, Xiao-Ping Zhang, Yu Wang, Chao Yu, Yi Wu

PDF

3 Models

TL;DR

This paper introduces Verifiable Process Rewards (VPR), a framework that uses intermediate verifiable signals to improve reinforcement learning in agentic reasoning tasks, leading to better performance and transferability.

Contribution

VPR converts verifiable intermediate actions into dense supervision signals, enhancing credit assignment and reasoning capabilities in large language models.

Findings

01

VPR outperforms outcome-level reward baselines in controlled environments.

02

VPR transfers effectively to general and agentic reasoning benchmarks.

03

Dense verifier-grounded rewards improve long-horizon credit assignment.

Abstract

Reinforcement learning from verifiable rewards (RLVR) has improved the reasoning abilities of large language models (LLMs), but most existing approaches rely on sparse outcome-level feedback. This sparsity creates a credit assignment challenge in long-horizon agentic reasoning: a trajectory may fail despite containing many correct intermediate decisions, or succeed despite containing flawed ones. In this work, we study a class of densely-verifiable agentic reasoning problems, where intermediate actions can be objectively checked by symbolic or algorithmic oracles. We propose Verifiable Process Rewards (VPR), a framework that converts such oracles into dense turn-level supervision for reinforcement learning, and instantiate it in three representative settings: search-based verification for dynamic deduction, constraint-based verification for logical reasoning, and posterior-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.