ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning

Davit Melikidze; Marian Schneider; Jessica Lam; Martin Wertich; Ido Hakimi; Barna P\'asztor; Andreas Krause

arXiv:2603.09692·cs.LG·March 11, 2026

ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning

Davit Melikidze, Marian Schneider, Jessica Lam, Martin Wertich, Ido Hakimi, Barna P\'asztor, Andreas Krause

PDF

Open Access

TL;DR

ActiveUltraFeedback introduces an active learning pipeline that efficiently generates high-quality preference data for training language models, reducing annotation costs while maintaining or improving performance.

Contribution

It presents a modular active learning framework with novel response selection methods, significantly reducing data annotation requirements for reinforcement learning from human feedback.

Findings

01

Achieves comparable or better performance with one-sixth of the data

02

Demonstrates effectiveness of novel response selection methods

03

Provides publicly available datasets and pipeline

Abstract

Reinforcement Learning from Human Feedback (RLHF) has become the standard for aligning Large Language Models (LLMs), yet its efficacy is bottlenecked by the high cost of acquiring preference data, especially in low-resource and expert domains. To address this, we introduce ACTIVEULTRAFEEDBACK, a modular active learning pipeline that leverages uncertainty estimates to dynamically identify the most informative responses for annotation. Our pipeline facilitates the systematic evaluation of standard response selection methods alongside DOUBLE REVERSE THOMPSON SAMPLING (DRTS) and DELTAUCB, two novel methods prioritizing response pairs with large predicted quality gaps, leveraging recent results showing that such pairs provide good signals for fine-tuning. Our experiments demonstrate that ACTIVEULTRAFEEDBACK yields high-quality datasets that lead to significant improvements in downstream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Explainable Artificial Intelligence (XAI) · Topic Modeling