RLHF from Heterogeneous Feedback via Personalization and Preference   Aggregation

Chanwoo Park; Mingyang Liu; Dingwen Kong; Kaiqing Zhang; Asuman; Ozdaglar

arXiv:2405.00254·cs.AI·May 28, 2024·2 cites

RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation

Chanwoo Park, Mingyang Liu, Dingwen Kong, Kaiqing Zhang, Asuman, Ozdaglar

PDF

Open Access

TL;DR

This paper advances RLHF by developing frameworks to handle diverse human preferences through personalization and aggregation, with theoretical guarantees and strategies to ensure truthful feedback in heterogeneous settings.

Contribution

It introduces novel methods for modeling and aggregating heterogeneous human feedback in RLHF, including personalization, reward models, and truthful preference aggregation with strategic behavior considerations.

Findings

01

Proposes personalization-based reward models with sample complexity guarantees.

02

Develops aggregation methods for diverse preferences with theoretical analysis.

03

Designs mechanisms to ensure truthful feedback from strategic human labelers.

Abstract

Reinforcement learning from human feedback (RLHF) has been an effective technique for aligning AI systems with human values, with remarkable successes in fine-tuning large-language models recently. Most existing RLHF paradigms make the underlying assumption that human preferences are relatively homogeneous, and can be encoded by a single reward model. In this paper, we focus on addressing the issues due to the inherent heterogeneity in human preferences, as well as their potential strategic behavior in providing feedback. Specifically, we propose two frameworks to address heterogeneous human feedback in principled ways: personalization-based one and aggregation-based one. For the former, we propose two approaches based on representation learning and clustering, respectively, for learning multiple reward models that trades off the bias (due to preference heterogeneity) and variance (due…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFuzzy Logic and Control Systems · Neural Networks and Applications

MethodsFocus