More Test-Time Compute Can Hurt: Overestimation Bias in LLM Beam Search

Gal Dalal; Assaf Hallak; Gal Chechik; Yftah Ziser

arXiv:2603.15377·cs.LG·March 18, 2026

More Test-Time Compute Can Hurt: Overestimation Bias in LLM Beam Search

Gal Dalal, Assaf Hallak, Gal Chechik, Yftah Ziser

PDF

Open Access

TL;DR

This paper reveals that increasing beam width in large language models can actually harm output quality due to overestimation bias caused by scorer noise, and it provides a theoretical framework to determine optimal beam width based on scorer signal-to-noise ratio.

Contribution

The paper introduces a novel analysis based on Extreme Value Theory that explains when wider beam search degrades performance and offers practical diagnostics for optimal beam width selection.

Findings

01

Overestimation bias grows with candidate pool size.

02

Optimal beam width depends on scorer's signal-to-noise ratio.

03

Perplexity scoring benefits diminish at any width, while PRM scoring improves with larger beams.

Abstract

Wider beam search should improve LLM reasoning, but when should you stop widening? Prior work on beam width selection has focused on inference efficiency \citep{qin2025dsbd, freitag2017beam}, without analyzing whether wider search can \emph{hurt} output quality. We present an analysis, grounded in Extreme Value Theory, that answers this question. Beam selection over noisy scorer outputs introduces a systematic overestimation bias that grows with the candidate pool size, and we derive a maximum useful beam width $\hat{k}$ beyond which search degrades performance. This critical width depends on the signal-to-noise ratio of the scorer: $\hat{k}$ grows exponentially with $(Δ/ σ)^{2}$ , where $Δ > 0$ is the quality advantage of correct paths over incorrect ones and $σ$ is the scorer noise. We validate this theory by comparing perplexity-guided and PRM-guided beam search…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Machine Learning and Data Classification · Machine Learning and Algorithms