SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration
Xichen Zhang, Sitong Wu, Haoru Tan, Shaozuo Yu, Yinghao Zhu, Ziyi He, Jiaya Jia

TL;DR
SmartSwitch is a reasoning framework that improves large language models' complex reasoning by detecting underthinking, encouraging deeper exploration, and significantly boosting performance on mathematical reasoning benchmarks.
Contribution
It introduces a plug-and-play inference strategy that monitors and guides LLM reasoning to overcome shallow thinking and enhance problem-solving capabilities.
Findings
Significant performance improvements on mathematical reasoning benchmarks.
Effective detection and correction of underthinking during inference.
Applicable to various large language model sizes.
Abstract
The long chain-of-thought (LongCoT) capability is central to the recent breakthroughs achieved by large language models in complex reasoning tasks. However, the accompanying issue of ''underthinking'', where models exhibit shallow reasoning by frequently switching thoughts without sufficient exploration, limits both performance and token efficiency. To address this problem, we propose a simple yet effective reasoning strategy: the SmartSwitch inference framework. This framework can be easily integrated into any large language model as a plug-and-play solution, continuously monitoring the model's reasoning process to detect underthinking and guide it toward deeper exploration of promising but overlooked thoughts. Specifically, the perception module identifies points where thoughts switch and evaluates the potential of the preceding thought using an off-the-shelf process reward model…
Peer Reviews
Decision·Submitted to ICLR 2026
1. Simple. training-free control that can be bolted onto existing decoders. Easy to prototype and tune. 2. Actionable diagnosis of underthinking, with concrete triggers and a backtracking-then-deepen intervention. 3. The method is motivated for SLMs where a little extra test-time guidance can yield meaningful gains under tight cost/latency budgets; this connects well to the hybrid SLM+LLM literature (e.g., selective teacher hints [A]). Btw, it is good to include the related works around this
1. Perception dependence on discourse style / tokenizer. Cue lists like 'Alternatively' are common in R1/Qwen-style verbose reasoning but appear far less in GPT-OSS or proprietary models whose analysis channel is terse or hidden. This risks under-triggering on compact styles and over-triggering on verbose ones. Token-level signals (entropy spikes) are mentioned but not deeply validated. **Recommend more models outside of the Qwen styled". 2. Domain. Nearly all evidence is math-centric. It remai
- Clarity: The paper is generally well written; the two-stage Perception/Intervention pipeline and pseudocode are easy to follow, and figures convey the intended workflow. - Problem framing: Surfacing underthinking—frequent, shallow thought switching—as a failure mode is an interesting observation that resonates with practitioner experience. - Empirical signal: On several math benchmarks and across multiple model sizes, the proposed workflow reports non-trivial gains over vanilla inference, so
1. Evidence that mainstream LongCoT models “underthink” is not yet solid. - Definition sensitivity. Underthinking is operationalized primarily by a length threshold on segmented “thoughts.” But different thought types (problem analysis, sanity check, verification, algebraic manipulation) naturally vary in token needs; short ≠ shallow. The current UF(L) metric therefore risks labeling by length, not by insufficient exploration. Please provide a typology of thought roles and token distributions p
- Clearly defines and empirically validates the underthinking phenomenon across multiple models and difficulty levels. - Proposes a simple, training-free, and model-agnostic framework that is easy to integrate. - Demonstrates consistent and significant performance gains, especially for smaller models, and even improves efficiency (fewer tokens, faster inference).
- Reliance on explicit linguistic cues: The method detects thought switches only via predefined phrases (e.g., "Alternatively"), missing implicit or subtle shifts in reasoning. In contrast, concurrent work like SwiReasoning (arXiv:2510.05069) effectively handles implicit thought switching in latent space. - Strong dependence on external PRM quality: Performance hinges on an off-the-shelf Process Reward Model. If the PRM is poorly calibrated, biased, or lacks sufficient context length, SmartSwi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
