Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

Seamus Somerstep; Vinod Raman; Unique Subedi; Yuekai Sun

arXiv:2505.17288·stat.ML·March 31, 2026

Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

Seamus Somerstep, Vinod Raman, Unique Subedi, Yuekai Sun

PDF

TL;DR

This paper compares supervised fine-tuning and Best-of-N methods for adapting large language models to bit string generation, analyzing their theoretical convergence properties under different conditions.

Contribution

It provides a theoretical comparison of two standard adaptation methods, revealing conditions where each method outperforms the other.

Findings

01

Supervised fine-tuning outperforms BoN when the setting is realizable.

02

BoN can have better convergence rates when realizability fails, depending on the failure mode.

03

The analysis highlights how response length affects convergence rates in both methods.

Abstract

Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new next token predictor on good generations. The second method, Best-of-N, trains a reward model to select good responses from a collection generated by an unaltered base model. If the learning setting is realizable, we find that supervised fine-tuning outperforms BoN through a better dependence on the response length in its rate of convergence. If realizability fails, then depending on the failure mode, BoN can enjoy a better rate of convergence in either n or a rate of convergence with better dependence on the response length.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.