Does Your Reasoning Model Implicitly Know When to Stop Thinking?
Zixuan Huang, Xin Xia, Yuxi Ren, Jianbin Zheng, Xuanda Wang, Zhixia Zhang, Hongyan Xie, Songshi Liang, Zehao Chen, Xuefeng Xiao, Fuzhen Zhuang, Jianxin Li, Deqing Wang, Yikun Ban

TL;DR
This paper reveals that large reasoning models implicitly understand when to stop reasoning, and introduces SAGE, a sampling method that leverages this to improve reasoning efficiency and accuracy.
Contribution
The paper uncovers the implicit stopping knowledge of LRMs and proposes SAGE, a novel sampling paradigm that enhances reasoning efficiency and accuracy.
Findings
LRMs implicitly know when to stop reasoning.
SAGE improves reasoning efficiency and accuracy.
SAGE-RL enhances performance on mathematical benchmarks.
Abstract
Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. Recent studies show that longer reasoning chains are frequently uncorrelated with correctness and can even be detrimental to accuracy. In a further in-depth analysis of this phenomenon, we surprisingly uncover and empirically verify that LRMs implicitly know the appropriate time to stop thinking, while this capability is obscured by current sampling paradigms. Motivated by this, we introduce SAGE (Self-Aware Guided Efficient Reasoning), a novel sampling paradigm that unleashes this efficient reasoning potential. Furthermore, integrating SAGE as mixed sampling into group-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
