Exploring Re-inforcement Learning via Human Feedback under User Heterogeneity
Sarvesh Shashidhar, Abhishek Mishra, Madhav Kotecha

TL;DR
This paper investigates how to improve reinforcement learning from human feedback by clustering workers based on their preferences and personalizing reward models, addressing user heterogeneity.
Contribution
It introduces an algorithm that jointly learns reward models and worker embeddings, demonstrating improved performance through clustering in a real dataset.
Findings
Clustering workers enhances reward model accuracy.
Personalized reward models increase win-rate.
Empirical results support the effectiveness of the approach.
Abstract
Re-inforcement learning from human feedback (RLHF) has been effective in the task of AI alignment. However, one of the key assumptions of RLHF is that the annotators (referred to as workers from here on out) have a homogeneous response space. This assumption is not true in most practical settings and there have been studies done in the past to challenge this notion. This work has been inspired by such studies and explores one of the ways to deal with heterogeneity in worker preferences - by clustering workers with similar preferences and personalising reward models for each cluster. This work provides an algorithm that encourages simultaneous learning of reward models and worker embeddings. This algorithm is then empirically tested against the Reddit TL;DR dataset with unique worker IDs. We have shown that clustering users into different groups based on their preferences and created…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Recommender Systems and Techniques · Ethics and Social Impacts of AI
