BoNBoN Alignment for Large Language Models and the Sweetness of   Best-of-n Sampling

Lin Gui; Cristina G\^arbacea; Victor Veitch

arXiv:2406.00832·cs.CL·November 5, 2024

BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling

Lin Gui, Cristina G\^arbacea, Victor Veitch

PDF

Open Access 1 Video

TL;DR

This paper analyzes best-of-$n$ sampling for aligning large language models to human preferences, showing its optimality in maximizing win rate and proposing a fine-tuning method called BoNBoN alignment to efficiently mimic this distribution.

Contribution

The paper establishes the optimality of best-of-$n$ sampling in alignment and introduces BoNBoN alignment for efficient imitation of this distribution.

Findings

01

Best-of-$n$ is essentially optimal for maximizing win rate.

02

BoNBoN alignment improves preference alignment with minimal off-target effects.

03

Mimicking best-of-$n$ distribution enhances model preference without significant performance loss.

Abstract

This paper concerns the problem of aligning samples from large language models to human preferences using best-of- $n$ sampling, where we draw $n$ samples, rank them, and return the best one. We consider two fundamental problems. First: what is the relationship between best-of- $n$ and approaches to alignment that train LLMs to output samples with a high expected reward (e.g., RLHF or DPO)? To answer this, we embed both the best-of- $n$ distribution and the sampling distributions learned by alignment procedures in a common class of tiltings of the base LLM distribution. We then show that, within this class, best-of- $n$ is essentially optimal in terms of the trade-off between win-rate against the base model vs KL distance from the base model. That is, best-of- $n$ is the best choice of alignment distribution if the goal is to maximize win rate. However, best-of- $n$ requires drawing $n$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling· slideslive

Taxonomy

TopicsTopic Modeling

MethodsBalanced Selection