SmartThinker: Learning to Compress and Preserve Reasoning by Step-Level Length Control
Xingyang He, Xiao Ling, Jie Liu

TL;DR
SmartThinker introduces a two-stage, step-level length control framework for reasoning models, reducing redundancy and improving efficiency without sacrificing accuracy by adaptively managing reasoning step lengths.
Contribution
It presents a novel step-level length control method using SCPO, enabling fine-grained management of reasoning steps based on their importance, which improves efficiency and reasoning quality.
Findings
Reduces redundant reasoning steps significantly.
Achieves comparable or better accuracy than existing methods.
Demonstrates effectiveness across multiple benchmarks.
Abstract
Large reasoning models (LRMs) have exhibited remarkable reasoning capabilities through inference-time scaling, but this progress has also introduced considerable redundancy and inefficiency into their reasoning processes, resulting in substantial computational waste. Previous work has attempted to mitigate this issue by penalizing the overall length of generated samples during reinforcement learning (RL), with the goal of encouraging a more concise chains of thought. However, we observe that such global length penalty often lead to excessive compression of critical reasoning steps while preserving unnecessary details in simpler ones, yielding a suboptimal trade-off between accuracy and efficiency. To address this issue, we propose SmartThinker, a two-stage learnable framework designed to enable fine-grained control over the length of reasoning chains based on the importance of each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Neural Networks and Applications · Fuzzy Logic and Control Systems
