RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Chanwoo Park, Mingyang Liu, Dingwen Kong, Kaiqing Zhang, Asuman, Ozdaglar

TL;DR
This paper advances RLHF by developing frameworks to handle diverse human preferences through personalization and aggregation, with theoretical guarantees and strategies to ensure truthful feedback in heterogeneous settings.
Contribution
It introduces novel methods for modeling and aggregating heterogeneous human feedback in RLHF, including personalization, reward models, and truthful preference aggregation with strategic behavior considerations.
Findings
Proposes personalization-based reward models with sample complexity guarantees.
Develops aggregation methods for diverse preferences with theoretical analysis.
Designs mechanisms to ensure truthful feedback from strategic human labelers.
Abstract
Reinforcement learning from human feedback (RLHF) has been an effective technique for aligning AI systems with human values, with remarkable successes in fine-tuning large-language models recently. Most existing RLHF paradigms make the underlying assumption that human preferences are relatively homogeneous, and can be encoded by a single reward model. In this paper, we focus on addressing the issues due to the inherent heterogeneity in human preferences, as well as their potential strategic behavior in providing feedback. Specifically, we propose two frameworks to address heterogeneous human feedback in principled ways: personalization-based one and aggregation-based one. For the former, we propose two approaches based on representation learning and clustering, respectively, for learning multiple reward models that trades off the bias (due to preference heterogeneity) and variance (due…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuzzy Logic and Control Systems · Neural Networks and Applications
MethodsFocus
