Promoting Efficient Reasoning with Verifiable Stepwise Reward

Chuhuai Yue; Chengqi Dong; Yinan Gao; Hang He; Jiajun Chai; Guojun Yin; Wei Lin

arXiv:2508.10293·cs.AI·August 19, 2025

Promoting Efficient Reasoning with Verifiable Stepwise Reward

Chuhuai Yue, Chengqi Dong, Yinan Gao, Hang He, Jiajun Chai, Guojun Yin, Wei Lin

PDF

Open Access 1 Video

TL;DR

This paper introduces a rule-based verifiable stepwise reward mechanism for large reasoning models, which reduces overthinking and improves efficiency without sacrificing reasoning accuracy.

Contribution

It proposes a novel reward system that evaluates intermediate reasoning steps, enhancing efficiency and reliability in complex reasoning tasks.

Findings

01

Significant reduction in output length while maintaining reasoning performance

02

Effective suppression of ineffective reasoning steps

03

Improved pass@k scores indicating better reasoning efficiency

Abstract

Large reasoning models (LRMs) have recently achieved significant progress in complex reasoning tasks, aided by reinforcement learning with verifiable rewards. However, LRMs often suffer from overthinking, expending excessive computation on simple problems and reducing efficiency. Existing efficient reasoning methods typically require accurate task assessment to preset token budgets or select reasoning modes, which limits their flexibility and reliability. In this work, we revisit the essence of overthinking and identify that encouraging effective steps while penalizing ineffective ones is key to its solution. To this end, we propose a novel rule-based verifiable stepwise reward mechanism (VSRM), which assigns rewards based on the performance of intermediate states in the reasoning trajectory. This approach is intuitive and naturally fits the step-by-step nature of reasoning tasks. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Promoting Efficient Reasoning with Verifiable Stepwise Reward· underline

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning