Scaling Flaws of Verifier-Guided Search in Mathematical Reasoning
Fei Yu, Yingru Li, Benyou Wang

TL;DR
Verifier-guided search in large language models for mathematical reasoning faces fundamental limitations due to verifier failures, which cause diminishing returns and underperformance at larger sample sizes, especially on challenging problems.
Contribution
This paper identifies and analyzes scaling flaws in verifier-guided search, revealing its limitations across models, benchmarks, and verifiers, and explores preliminary mitigation strategies.
Findings
Verifier failures cause underperformance at larger sample sizes.
Verifier-guided search's advantages diminish with increasing sample size.
Reducing reliance on verifiers shows potential as a mitigation approach.
Abstract
Large language models (LLMs) struggle with multi-step reasoning, where inference-time scaling has emerged as a promising strategy for performance improvement. Verifier-guided search outperforms repeated sampling when sample size is limited by selecting and prioritizing valid reasoning paths. However, we identify a critical limitation: scaling flaws, prevalent across different models (Mistral 7B and DeepSeekMath 7B), benchmarks (GSM8K and MATH), and verifiers (outcome value models and process reward models). As sample size increases, verifier-guided search exhibits diminishing advantages and eventually underperforms repeated sampling. Our analysis attributes this to verifier failures, where imperfect verifiers misrank candidates and erroneously prune all valid paths. These issues are further exacerbated in challenging and out-of-distribution problems, restricting search effectiveness. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · AI-based Problem Solving and Planning · Intelligent Tutoring Systems and Adaptive Learning
