Scaling Test-Time Compute Without Verification or RL is Suboptimal

Amrith Setlur; Nived Rajaraman; Sergey Levine; Aviral Kumar

arXiv:2502.12118·cs.LG·February 19, 2025

Scaling Test-Time Compute Without Verification or RL is Suboptimal

Amrith Setlur, Nived Rajaraman, Sergey Levine, Aviral Kumar

PDF

Open Access

TL;DR

This paper demonstrates that verifier-based finetuning of large language models significantly outperforms verifier-free methods as test-time compute and data scale, especially for heterogeneous solution distributions.

Contribution

The paper proves the superiority of verifier-based finetuning over verifier-free approaches in scaling test-time compute for large language models.

Findings

01

Verifier-based methods outperform verifier-free methods as compute scales.

02

Performance gap widens with larger test-time budgets and data.

03

Empirical validation on multiple reasoning tasks with various model sizes.

Abstract

Despite substantial advances in scaling test-time compute, an ongoing debate in the community is how it should be scaled up to enable continued and efficient improvements with scaling. There are largely two approaches: first, distilling successful search or thinking traces; and second, using verification (e.g., 0/1 outcome rewards, reward models, or verifiers) to guide reinforcement learning (RL) and search algorithms. In this paper, we prove that finetuning LLMs with verifier-based (VB) methods based on RL or search is far superior to verifier-free (VF) approaches based on distilling or cloning search traces, given a fixed amount of compute/data budget. Further, we show that as we scale test-time compute (measured as the output token length) and training data, suboptimality of VF methods scales poorly compared to VB when the base pre-trained LLM presents a heterogeneous distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques · Software Testing and Debugging Techniques

MethodsBalanced Selection