AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

Yuliang Liu; Junjie Lu; Zhaoling Chen; Chaofeng Qu; Jason Klein Liu; Chonghan Liu; Zefan Cai; Yunhui Xia; Li Zhao; Jiang Bian; Chuheng Zhang; Wei Shen; Zhouhan Lin

arXiv:2502.13943·cs.AI·June 3, 2025

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

Yuliang Liu, Junjie Lu, Zhaoling Chen, Chaofeng Qu, Jason Klein Liu, Chonghan Liu, Zefan Cai, Yunhui Xia, Li Zhao, Jiang Bian, Chuheng Zhang, Wei Shen, Zhouhan Lin

PDF

Open Access 1 Repo 2 Datasets

TL;DR

AdaptiveStep introduces a confidence-based approach to dividing reasoning steps in process reward models, improving performance and efficiency in mathematical reasoning and code generation tasks without manual annotations.

Contribution

It presents a novel confidence-driven method for dividing reasoning steps, enhancing reward model training and outperforming existing strategies in key tasks.

Findings

01

State-of-the-art performance in mathematical reasoning and code generation

02

Reduces construction costs by over 30%

03

Improves transferability and generalization of reward models

Abstract

Current approaches for training Process Reward Models (PRMs) often involve breaking down responses into multiple reasoning steps using rule-based techniques, such as using predefined placeholder tokens or setting the reasoning step's length into a fixed size. These approaches overlook the fact that specific words do not typically mark true decision points in a text. To address this, we propose AdaptiveStep, a method that divides reasoning steps based on the model's confidence in predicting the next word. This division method provides more decision-making information at each step, enhancing downstream tasks, such as reward model learning. Moreover, our method does not require manual annotation. We demonstrate its effectiveness through experiments with AdaptiveStep-trained PRMs in mathematical reasoning and code generation tasks. Experimental results indicate that the outcome PRM achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lux0926/asprm
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Intelligent Tutoring Systems and Adaptive Learning · Business Process Modeling and Analysis