Thinking Long, but Short: Stable Sequential Test-Time Scaling for Large Reasoning Models
Michael R. Metel, Yufei Cui, Boxing Chen, and Prasanna Parthasarathi

TL;DR
This paper introduces Min-Seek, a novel test-time scaling method that stabilizes and improves the accuracy of large reasoning models during sequential reasoning, without requiring fine-tuning and maintaining efficiency.
Contribution
Min-Seek is a new sequential scaling technique that enhances accuracy, stabilizes reasoning, and extends reasoning beyond maximum context length without additional fine-tuning.
Findings
Significant accuracy improvements across various reasoning tasks.
Enhanced stability in sequential reasoning processes.
Ability to reason beyond maximum context length efficiently.
Abstract
Sequential test-time scaling is a promising training-free method to improve large reasoning model accuracy, but as currently implemented, significant limitations have been observed. Inducing models to think for longer can increase their accuracy, but as the length of reasoning is further extended, it has also been shown to result in accuracy degradation and model instability. This work presents a novel sequential test-time scaling method, Min-Seek, which improves model accuracy significantly over a wide range of induced thoughts, stabilizing the accuracy of sequential scaling, and removing the need for reasoning length fine-tuning. Beyond improving model accuracy over a variety of reasoning tasks, our method is inherently efficient, as only the KV pairs of one additional induced thought are kept in the KV cache during reasoning. With a custom KV cache which stores keys without position…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Machine Learning in Healthcare
