Revisiting the (Sub)Optimality of Best-of-N for Inference-Time Alignment
Ved Sriraman, Adam Block

TL;DR
This paper analyzes the effectiveness of Best-of-N sampling for language model alignment, showing it is often optimal for win-rate and proposing a variant to prevent reward hacking, thus explaining its practical success.
Contribution
It demonstrates that properly tuned Best-of-N is both statistically and computationally optimal for win-rate, and introduces a variant that eliminates reward hacking.
Findings
Best-of-N is optimal for win-rate under certain conditions.
A simple variant can prevent reward hacking while maintaining performance.
Prior analyses focusing on expected true reward are less relevant for practical settings.
Abstract
Best-of-N (BoN) sampling is a widely used inference-time alignment method for language models, whereby N candidate responses are sampled from a reference model and the one with the highest predicted reward according to a learned reward model is selected. Despite its widespread practical use, recent theoretical work has suggested that it is statistically suboptimal and vulnerable to reward hacking, the process by which models exploit weaknesses in the learned reward model to achieve high estimated reward without genuinely improving performance. We revisit this question under assumptions that more closely reflect practice than that of prior work. In particular, in contradistinction to earlier analyses that focused on expected true reward, which may not be meaningful in many practical settings, we investigate how inference-time alignment affects the win-rate, a pairwise comparison-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
