What If We Allocate Test-Time Compute Adaptively?
Ahsan Bilal, Ahmed Mohsin, Muhammad Umer, Ali Subhan, Hassan Rizwan, Ayesha Mohsin, Dean Hougen

TL;DR
This paper introduces a verifier-guided adaptive inference framework that dynamically allocates test-time compute, improving reasoning efficiency and accuracy over uniform scaling methods across multiple benchmarks.
Contribution
It proposes a novel iterative, verifier-guided approach that adaptively allocates compute during inference, outperforming traditional fixed strategies.
Findings
Outperforms uniform test-time scaling on multiple datasets
Achieves large gains on MATH-500 and AIME24
Demonstrates efficiency through FLOPs and compute intensity metrics
Abstract
Test-time compute scaling allocates inference computation uniformly, uses fixed sampling strategies, and applies verification only for reranking. In contrast, we propose a verifier-guided adaptive framework treating reasoning as iterative trajectory generation and selection. For each problem, the agent runs multiple inference iterations. In each iteration, it optionally produces a high-level plan, selects a set of reasoning tools and a compute strategy together with an exploration parameter, and then generates a candidate reasoning trajectory. A process reward model (PRM) serves as a unified control signal: within each iteration, step-level PRM scores are aggregated to guide pruning and expansion during generation, and across iterations, aggregated trajectory rewards are used to select the final response. Across datasets, our dynamic, PRM-guided approach consistently outperforms direct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
