Scalable Best-of-N Selection for Large Language Models via Self-Certainty
Zhewei Kang, Xuandong Zhao, Dawn Song

TL;DR
This paper introduces self-certainty, an efficient metric based on LLM output probabilities, to improve best-of-N selection for reasoning tasks without external reward models, scaling well and enhancing performance.
Contribution
It proposes self-certainty as a novel, reward-free response evaluation method that scales with sample size and improves reasoning in large language models.
Findings
Self-certainty correlates with response accuracy.
It scales effectively with increasing sample size N.
It enhances reasoning performance beyond greedy decoding.
Abstract
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models (LLMs) through increased test-time computation. Current state-of-the-art methods often employ computationally intensive reward models for response evaluation and selection. Reward-free alternatives, like self-consistency and universal self-consistency, are limited in their ability to handle open-ended generation tasks or scale effectively. To address these limitations, we propose self-certainty, a novel and efficient metric that leverages the inherent probability distribution of LLM outputs to estimate response quality without requiring external reward models. We hypothesize that higher distributional self-certainty, aggregated across multiple samples, correlates with improved response accuracy, as it reflects greater confidence in the generated output. Through extensive experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
