Dual Active Learning for Reinforcement Learning from Human Feedback

Pangpang Liu; Chengchun Shi; Will Wei Sun

arXiv:2410.02504·stat.ML·January 3, 2025

Dual Active Learning for Reinforcement Learning from Human Feedback

Pangpang Liu, Chengchun Shi, Will Wei Sun

PDF

Open Access

TL;DR

This paper introduces a dual active learning approach for reinforcement learning from human feedback, optimizing the selection of conversations and teachers to efficiently learn reward functions for aligning large language models with human preferences.

Contribution

It proposes a novel dual active reward learning algorithm combined with pessimistic RL, with theoretical guarantees and superior empirical performance.

Findings

01

The reward estimator achieves minimal generalized variance asymptotically.

02

The sub-optimality of the policy scales as O(1/√T) with sample size.

03

The method outperforms existing approaches in simulations and LLM experiments.

Abstract

Aligning large language models (LLMs) with human preferences is critical to recent advances in generative artificial intelligence. Reinforcement learning from human feedback (RLHF) is widely applied to achieve this objective. A key step in RLHF is to learn the reward function from human feedback. However, human feedback is costly and time-consuming, making it essential to collect high-quality conversation data for human teachers to label. Additionally, different human teachers have different levels of expertise. It is thus critical to query the most appropriate teacher for their opinions. In this paper, we use offline reinforcement learning (RL) to formulate the alignment problem. Motivated by the idea of $D$ -optimal design, we first propose a dual active reward learning algorithm for the simultaneous selection of conversations and teachers. Next, we apply pessimistic RL to solve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications