Step Potential Advantage Estimation: Harnessing Intermediate Confidence and Correctness for Efficient Mathematical Reasoning

Fei Wu; Zhenrong Zhang; Qikai Chang; Jianshu Zhang; Quan Liu; Jun Du

arXiv:2601.03823·cs.CL·January 8, 2026

Step Potential Advantage Estimation: Harnessing Intermediate Confidence and Correctness for Efficient Mathematical Reasoning

Fei Wu, Zhenrong Zhang, Qikai Chang, Jianshu Zhang, Quan Liu, Jun Du

PDF

Open Access

TL;DR

This paper introduces Step Potential Advantage Estimation (SPAE), a novel method that uses intermediate confidence and correctness signals to improve reasoning efficiency and accuracy in large language models during reinforcement learning.

Contribution

The paper presents a training-free probing mechanism for extracting step-level reasoning signals and a new advantage estimation method that enhances credit assignment and reduces response length.

Findings

01

SPAE improves accuracy across multiple benchmarks.

02

SPAE reduces response length significantly.

03

SPAE outperforms existing RL and advantage estimation methods.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) elicits long chain-of-thought reasoning in large language models (LLMs), but outcome-based rewards lead to coarse-grained advantage estimation. While existing approaches improve RLVR via token-level entropy or sequence-level length control, they lack a semantically grounded, step-level measure of reasoning progress. As a result, LLMs fail to distinguish necessary deduction from redundant verification: they may continue checking after reaching a correct solution and, in extreme cases, overturn a correct trajectory into an incorrect final answer. To remedy the lack of process supervision, we introduce a training-free probing mechanism that extracts intermediate confidence and correctness and combines them into a Step Potential signal that explicitly estimates the reasoning state at each step. Building on this signal, we propose Step…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)