Provable Multi-Party Reinforcement Learning with Diverse Human Feedback

Huiying Zhong; Zhun Deng; Weijie J. Su; Zhiwei Steven Wu; Linjun Zhang

arXiv:2403.05006·cs.LG·March 11, 2024·1 cites

Provable Multi-Party Reinforcement Learning with Diverse Human Feedback

Huiying Zhong, Zhun Deng, Weijie J. Su, Zhiwei Steven Wu, Linjun Zhang

PDF

Open Access

TL;DR

This paper develops a theoretical framework for multi-party reinforcement learning with human feedback, addressing the challenges of aggregating diverse preferences and establishing sample complexity bounds for various social welfare functions.

Contribution

It introduces the first theoretical analysis of multi-party RLHF, incorporating meta-learning and social welfare functions, with sample complexity bounds and fairness guarantees.

Findings

01

Multi-party RLHF has higher sample complexity than single-party RLHF.

02

Meta-learning enables learning multiple preferences effectively.

03

Different social welfare functions impact the aggregation and fairness of preferences.

Abstract

Reinforcement learning with human feedback (RLHF) is an emerging paradigm to align models with human preferences. Typically, RLHF aggregates preferences from multiple individuals who have diverse viewpoints that may conflict with each other. Our work \textit{initiates} the theoretical study of multi-party RLHF that explicitly models the diverse preferences of multiple individuals. We show how traditional RLHF approaches can fail since learning a single reward function cannot capture and balance the preferences of multiple individuals. To overcome such limitations, we incorporate meta-learning to learn multiple preferences and adopt different social welfare functions to aggregate the preferences across multiple parties. We focus on the offline learning setting and establish sample complexity bounds, along with efficiency and fairness guarantees, for optimizing diverse social welfare…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsFocus · ALIGN