SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking
Weiyang Huang, Xuefeng Bai, Kehai Chen, Xinyang Chen, Yibin Chen, Weili Guan, and Min Zhang

TL;DR
SAT is a framework that adaptively prunes reasoning steps in large models, reducing token usage by up to 40% while maintaining or improving accuracy across multiple benchmarks.
Contribution
It introduces a stepwise, difficulty-aware pruning method using a finite-state machine and a process reward model to optimize reasoning efficiency.
Findings
Achieves up to 40% reduction in reasoning tokens.
Maintains or improves accuracy across 9 models and 7 benchmarks.
Demonstrates effectiveness of adaptive, difficulty-aware reasoning pruning.
Abstract
Large Reasoning Models (LRMs) have revolutionized complex problem-solving, yet they exhibit a pervasive "overthinking", generating unnecessarily long reasoning chains. While current solutions improve token efficiency, they often sacrifice fine-grained control or risk disrupting the logical integrity of the reasoning process. To address this, we introduce Stepwise Adaptive Thinking (SAT), a framework that performs step-level, difficulty-aware pruning while preserving the core reasoning structure. SAT formulates reasoning as a Finite-State Machine (FSM) with distinct thinking modes (Slow, Normal, Fast, Skip). It navigates these states dynamically using a lightweight Process Reward Model (PRM), compressing easy steps while preserving depth for hard ones. Experiments across 9 LRMs and 7 benchmarks show that SAT achieves up to 40% reduction in reasoning tokens while generally maintaining or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
