Sample Efficient Preference Alignment in LLMs via Active Exploration

Viraj Mehta; Syrine Belakaria; Vikramjeet Das; Ojash Neopane; and Yijia Dai; Ilija Bogunovic; Barbara Engelhardt; Stefano Ermon; and Jeff Schneider; Willie Neiswanger

arXiv:2312.00267·cs.LG·March 21, 2025·1 cites

Sample Efficient Preference Alignment in LLMs via Active Exploration

Viraj Mehta, Syrine Belakaria, Vikramjeet Das, Ojash Neopane, and Yijia Dai, Ilija Bogunovic, Barbara Engelhardt, Stefano Ermon, and Jeff Schneider, Willie Neiswanger

PDF

Open Access 1 Repo

TL;DR

This paper introduces an active exploration method for preference alignment in large language models, reducing human feedback costs through a formal dueling bandit framework with proven regret bounds.

Contribution

It formalizes preference alignment as an active contextual dueling bandit problem and proposes an efficient algorithm with theoretical guarantees, extending it for practical LLM use.

Findings

01

Outperforms baselines with limited human preference samples

02

Effective on multiple language models and datasets

03

Contributes two new real-world datasets

Abstract

Preference-based feedback is important for many applications in machine learning where evaluation of a reward function is not feasible. Notable recent examples arise in preference alignment for large language models, including in reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO). For many applications of preference alignment, the cost of acquiring human feedback can be substantial. In this work, we take advantage of the fact that one can often choose contexts at which to obtain human feedback to most efficiently identify a good policy, and formalize the setting as an active contextual dueling bandit problem. We propose an active exploration algorithm to efficiently select the data and provide theoretical proof that it has a polynomial worst-case regret bound. We extend the setting and methodology for practical use in preference alignment of large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

belakaria/active-llm-alignment
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Machine Learning and Algorithms