Iterative Deepening Sampling as Efficient Test-Time Scaling
Weizhe Chen, Sven Koenig, Bistra Dilkina

TL;DR
This paper introduces an iterative deepening sampling method to improve self-correction in large language models, enhancing their reasoning performance on complex tasks by efficiently scaling test-time computation.
Contribution
It proposes a novel iterative deepening sampling framework that systematically triggers self-correction, leading to higher success rates on challenging reasoning benchmarks.
Findings
Improves success rate on Math500 and AIME benchmarks
Enhances self-reflection data quality for complex problem-solving
Demonstrates effectiveness through extensive ablation studies
Abstract
Recent reasoning models, such as OpenAI's O1 series, have demonstrated exceptional performance on complex reasoning tasks and revealed new test-time scaling laws. Inspired by this, many people have been studying how to train models to achieve effective self-evaluation and self-correction to further enable the scaling paradigm. However, less studied is how to efficiently scale test-time compute from a fixed model, and this remains a challenge. In this paper, we address this challenge by focusing on enhancing the quality of self-reflection data generation for complex problem-solving at test time, which can also subsequently improve the training of next-generation large language models (LLMs). Specifically, we explore how systematically triggering a model's self-correction mechanisms can improve performance on challenging reasoning tasks. To this end, we propose a novel iterative deepening…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling
