Trust but Verify! A Survey on Verification Design for Test-time Scaling
V Venktesh, Mandeep Rathee, Avishek Anand

TL;DR
This survey reviews various verification methods used in test-time scaling of large language models, categorizing their training mechanisms and utility to improve inference performance.
Contribution
It provides a comprehensive categorization and analysis of verifier training approaches in test-time scaling for LLMs, which was lacking in prior literature.
Findings
Diverse verifier types include prompt-based, discriminative, and generative models.
Verification approaches enhance LLM performance by exploring decoding search space.
The survey offers a unified view and a repository of verification methods.
Abstract
Test-time scaling (TTS) has emerged as a new frontier for scaling the performance of Large Language Models. In test-time scaling, by using more computational resources during inference, LLMs can improve their reasoning process and task performance. Several approaches have emerged for TTS such as distilling reasoning traces from another model or exploring the vast decoding search space by employing a verifier. The verifiers serve as reward models that help score the candidate outputs from the decoding process to diligently explore the vast solution space and select the best outcome. This paradigm commonly termed has emerged as a superior approach owing to parameter free scaling at inference time and high performance gains. The verifiers could be prompt-based, fine-tuned as a discriminative or generative model to verify process paths, outcomes or both. Despite their widespread adoption,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
