Beyond Model Scaling: Test-Time Intervention for Efficient Deep Reasoning
Qianyue Wang, Jinwu Hu, Yufeng Wang, Huanxiang Lin, Bolin Chen, Zhiquan Wen, Yaofo Chen, Mingkui Tan

TL;DR
This paper introduces Think-with-Me, a test-time interactive reasoning paradigm that incorporates external feedback to improve the efficiency and accuracy of large reasoning models during multi-step reasoning tasks.
Contribution
It proposes a novel test-time intervention method using external feedback at transitional points to enhance reasoning efficiency and accuracy in large models.
Findings
Outperforms existing methods on AIME24 with 7.19% higher accuracy.
Reduces reasoning length by 81% under limited context windows.
Effective in security and creative reasoning tasks.
Abstract
Large Reasoning Models (LRMs) excel at multi-step reasoning but often suffer from inefficient reasoning processes like overthinking and overshoot, where excessive or misdirected reasoning increases computational cost and degrades performance. Existing efficient reasoning methods operate in a closed-loop manner, lacking mechanisms for external intervention to guide the reasoning process. To address this, we propose Think-with-Me, a novel test-time interactive reasoning paradigm that introduces external feedback intervention into the reasoning process. Our key insights are that transitional conjunctions serve as natural points for intervention, signaling phases of self-validation or exploration and using transitional words appropriately to prolong the reasoning enhances performance, while excessive use affects performance. Building on these insights, Think-with-Me pauses reasoning at…
Peer Reviews
Decision·Submitted to ICLR 2026
1、The study fully adheres to a rigorous logical chain of scientific exploration. The team first systematically analyzed the model’s reasoning behavior, identified that "transitional words" can serve as intervention nodes, and validated their effectiveness. This rigorous preliminary exploration laid a theoretical foundation for the design of Think-with-Me, forming a scientific closed loop. 2、The experimental validation is comprehensive, covering a wide range of tasks and comparing with various ma
1、lthough the authors provide additional experimental details in the appendix, they do not release the source code, resulting in low reproducibility. 2、his approach relies on an external feedback mechanism, which may introduce new risks. If the LLM proxy generates incorrect feedback and the target model lacks a built-in correction mechanism, performance degradation or error propagation could occur; the authors offer no further discussion on this issue.
1. The paper is well-written, clear, and easy to follow. 2. The proposed method achieves a significant improvement in compressing reasoning length. 3. It provides both LLM-based and human-in-the-loop feedback mechanisms, enhancing the scalability of the approach.
1. Several existing works, for example [1] have already explored, from various perspectives, the use of external feedback frameworks—such as human feedback, model feedback, or verifiers—to improve LLM training, and these approaches can be adapted to address the long2short problem. Therefore, the empirical motivation of this paper needs to be further strengthened. 2. The paper's two key observations are not new: the first observation has been similarly articulated in numerous works since [2], and
* This paper is well-written and easy to understand. * This paper addresses an important issue in optimizing the test-time reasoning efficiency.
1. My main criticism is **on the design of the experiments**: a. In the observation experiments of Section 3.1 in Figure 1, the authors conduct their analysis on DeepSeek-R1-Distill-Qwen-32B, showing in Figures 1(a) and 1(b) the prevalence of conjunction tokens in the reasoning traces of **o1-like models**. However, when investigating the influence of these tokens, they switch their experimental model to Qwen2.5-72B-Instruct, which, as far as I know, does **not possess long reasoning ability*
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
