The Power of Active Multi-Task Learning in Reinforcement Learning from   Human Feedback

Ruitao Chen; Liwei Wang

arXiv:2405.11226·cs.LG·March 6, 2025

The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback

Ruitao Chen, Liwei Wang

PDF

Open Access

TL;DR

This paper models reinforcement learning from human feedback as a contextual dueling bandit problem, proposing a task relevance-aware sampling strategy that reduces sample complexity and enhances learning efficiency.

Contribution

It introduces a novel formulation of RLHF as a contextual dueling bandit problem with a linear representation, and develops an algorithm that adaptively allocates samples based on task relevance.

Findings

01

Sample complexity is reduced by considering task relevance.

02

The proposed method achieves ε-optimality with fewer source task samples.

03

Target task sample complexity scales linearly with latent space dimension.

Abstract

Reinforcement learning from human feedback (RLHF) has contributed to performance improvements in large language models. To tackle its reliance on substantial amounts of human-labeled data, a successful approach is multi-task representation learning, which involves learning a high-quality, low-dimensional representation from a wide range of source tasks. In this paper, we formulate RLHF as the contextual dueling bandit problem and assume a common linear representation. We demonstrate that the sample complexity of source tasks in multi-task RLHF can be reduced by considering task relevance and allocating different sample sizes to source tasks with varying task relevance. We further propose an algorithm to estimate task relevance by a small number of additional data and then learn a policy. We prove that to achieve $ε -$ optimal, the sample complexity of the source tasks can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications