Iterative Deepening Sampling as Efficient Test-Time Scaling

Weizhe Chen; Sven Koenig; Bistra Dilkina

arXiv:2502.05449·cs.CL·June 3, 2025

Iterative Deepening Sampling as Efficient Test-Time Scaling

Weizhe Chen, Sven Koenig, Bistra Dilkina

PDF

Open Access

TL;DR

This paper introduces an iterative deepening sampling method to improve self-correction in large language models, enhancing their reasoning performance on complex tasks by efficiently scaling test-time computation.

Contribution

It proposes a novel iterative deepening sampling framework that systematically triggers self-correction, leading to higher success rates on challenging reasoning benchmarks.

Findings

01

Improves success rate on Math500 and AIME benchmarks

02

Enhances self-reflection data quality for complex problem-solving

03

Demonstrates effectiveness through extensive ablation studies

Abstract

Recent reasoning models, such as OpenAI's O1 series, have demonstrated exceptional performance on complex reasoning tasks and revealed new test-time scaling laws. Inspired by this, many people have been studying how to train models to achieve effective self-evaluation and self-correction to further enable the scaling paradigm. However, less studied is how to efficiently scale test-time compute from a fixed model, and this remains a challenge. In this paper, we address this challenge by focusing on enhancing the quality of self-reflection data generation for complex problem-solving at test time, which can also subsequently improve the training of next-generation large language models (LLMs). Specifically, we explore how systematically triggering a model's self-correction mechanisms can improve performance on challenging reasoning tasks. To this end, we propose a novel iterative deepening…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling