Provable Multi-Party Reinforcement Learning with Diverse Human Feedback
Huiying Zhong, Zhun Deng, Weijie J. Su, Zhiwei Steven Wu, Linjun Zhang

TL;DR
This paper develops a theoretical framework for multi-party reinforcement learning with human feedback, addressing the challenges of aggregating diverse preferences and establishing sample complexity bounds for various social welfare functions.
Contribution
It introduces the first theoretical analysis of multi-party RLHF, incorporating meta-learning and social welfare functions, with sample complexity bounds and fairness guarantees.
Findings
Multi-party RLHF has higher sample complexity than single-party RLHF.
Meta-learning enables learning multiple preferences effectively.
Different social welfare functions impact the aggregation and fairness of preferences.
Abstract
Reinforcement learning with human feedback (RLHF) is an emerging paradigm to align models with human preferences. Typically, RLHF aggregates preferences from multiple individuals who have diverse viewpoints that may conflict with each other. Our work \textit{initiates} the theoretical study of multi-party RLHF that explicitly models the diverse preferences of multiple individuals. We show how traditional RLHF approaches can fail since learning a single reward function cannot capture and balance the preferences of multiple individuals. To overcome such limitations, we incorporate meta-learning to learn multiple preferences and adopt different social welfare functions to aggregate the preferences across multiple parties. We focus on the offline learning setting and establish sample complexity bounds, along with efficiency and fairness guarantees, for optimizing diverse social welfare…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsFocus · ALIGN
