Learning Generative Selection for Best-of-N

Shubham Toshniwal; Aleksander Ficek; Siddhartha Jain; Wei Du; Vahid Noroozi; Sadegh Mahdavi; Somshubra Majumdar; Igor Gitman

arXiv:2602.02143·cs.LG·February 3, 2026

Learning Generative Selection for Best-of-N

Shubham Toshniwal, Aleksander Ficek, Siddhartha Jain, Wei Du, Vahid Noroozi, Sadegh Mahdavi, Somshubra Majumdar, Igor Gitman

PDF

Open Access

TL;DR

This paper demonstrates that small reasoning models can learn effective generative selection strategies through reinforcement learning, significantly improving their ability to select correct solutions in reasoning tasks, thus enabling scalable test-time inference.

Contribution

The authors introduce a reinforcement learning approach to train small models for generative selection, achieving performance comparable to larger models on reasoning benchmarks.

Findings

01

Small models outperform baselines in math and code reasoning tasks.

02

Reinforcement learning enables small models to generalize to stronger model outputs.

03

Models often match or surpass larger models in selection quality.

Abstract

Scaling test-time compute via parallel sampling can substantially improve LLM reasoning, but is often limited by Best-of-N selection quality. Generative selection methods, such as GenSelect, address this bottleneck, yet strong selection performance remains largely limited to large models. We show that small reasoning models can acquire strong GenSelect capabilities through targeted reinforcement learning. To this end, we synthesize selection tasks from large-scale math and code instruction datasets by filtering to instances with both correct and incorrect candidate solutions, and train 1.7B-parameter models with DAPO to reward correct selections. Across math (AIME24, AIME25, HMMT25) and code (LiveCodeBench) reasoning benchmarks, our models consistently outperform prompting and majority-voting baselines, often approaching or exceeding much larger models. Moreover, these gains generalize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning and Data Classification · Machine Learning and Algorithms