Adaptive Generate-Rank-Verify: Inference-Time Search with Costly Verification
Shaddin Dughmi, Mahdi Haghifam, Yusuf Hakan Kalayci

TL;DR
This paper introduces ADAP, an adaptive search algorithm for inference-time language models that efficiently balances cheap scoring and costly verification to improve performance in tasks like mathematical reasoning and programming.
Contribution
It formalizes the generative active search problem, characterizes the optimal policy with known distributions, and proposes ADAP, a practical algorithm with provable near-optimality under certain assumptions.
Findings
ADAP reduces verification costs compared to fixed policies.
Experiments show ADAP outperforms non-adaptive and baseline methods.
Theoretical analysis confirms near-optimality under monotonicity assumptions.
Abstract
Many inference-time language-model pipelines combine a cheap reward signal with an expensive verifier, such as exact answer checking in mathematical reasoning or hidden-test execution in code generation. We formalize this setting using a learning-theoretic lens as generative active search: a cost-sensitive first-positive search problem in which a policy adaptively samples candidates from an unknown distribution, observes cheap scores, and pays for verifier labels until it finds a positive example. For a fixed prompt, the generator and reward model induce two unknown objects: a distribution over reward scores and a score-conditioned success function. When these quantities are known, we characterize the distribution-aware optimal policy using a dynamic programming approach. In the realistic and practical setting where both the score distribution and success function are unknown, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
