Nearly Optimal Active Preference Learning and Its Application to LLM Alignment
Yao Zhao, Kwang-Sung Jun

TL;DR
This paper introduces new active learning algorithms tailored for preference learning in LLM alignment, providing theoretical guarantees and demonstrating improved sample efficiency on real datasets.
Contribution
It proposes problem-specific active learning algorithms for preference data, including the first instance-dependent label complexity guarantee, enhancing sample efficiency.
Findings
Improved sample efficiency over existing methods
Theoretical label complexity guarantee for preference learning
Effective performance demonstrated on real-world datasets
Abstract
Aligning large language models (LLMs) depends on high-quality datasets of human preference labels, which are costly to collect. Although active learning has been studied to improve sample efficiency relative to passive collection, many existing approaches adopt classical experimental design criteria such as G- or D-optimality. These objectives are not tailored to the structure of preference learning, leaving open the design of problem-specific algorithms. In this work, we identify a simple intuition specific to preference learning that calls into question the suitability of these existing design objectives. Motivated by this insight, we propose two active learning algorithms. The first provides the first instance-dependent label complexity guarantee for this setting, and the second is a simple, practical greedy method. We evaluate our algorithm on real-world preference datasets and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Topic Modeling
