Adaptive Test-Time Compute Allocation via Learned Heuristics over Categorical Structure
Shuhui Qu

TL;DR
This paper introduces a learned heuristics-based framework for adaptive test-time compute allocation in large language models, significantly reducing verification costs while improving reasoning accuracy.
Contribution
It proposes a novel state-level selective verification method combining gating, ranking, and adaptive verifier allocation to optimize reasoning efficiency.
Findings
Achieves higher accuracy than existing methods on the MATH benchmark.
Uses 44% fewer verifier calls compared to baseline approaches.
Effectively distributes verification effort to most informative intermediate states.
Abstract
Test-time computation has become a primary driver of progress in large language model (LLM) reasoning, but it is increasingly bottlenecked by expensive verification. In many reasoning systems, a large fraction of verifier calls are spent on redundant or unpromising intermediate hypotheses. We study reasoning under a \emph{verification-cost-limited} setting and ask how verification effort should be allocated across intermediate states. We propose a state-level selective verification framework that combines (i) deterministic feasibility gating over a structured move interface, (ii) pre-verification ranking using a hybrid of learned state-distance and residual scoring, and (iii) adaptive allocation of verifier calls based on local uncertainty. Unlike solution-level best-of- or uniform intermediate verification, our method distributes verification where it is most informative. On the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
