PAL: Pluralistic Alignment Framework for Learning from Heterogeneous   Preferences

Daiwei Chen; Yi Chen; Aniket Rege; Ramya Korlakai Vinayak

arXiv:2406.08469·cs.LG·June 13, 2024

PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences

Daiwei Chen, Yi Chen, Aniket Rege, Ramya Korlakai Vinayak

PDF

Open Access 1 Repo

TL;DR

PAL introduces a novel framework for learning from diverse human preferences, enabling foundation models to better adapt to plurality of opinions and improve reward modeling efficiency across multiple domains.

Contribution

The paper proposes a preference modeling framework using the ideal point model and mixture modeling to capture preference plurality and generalize to unseen users, enhancing reward model training.

Findings

01

PAL achieves competitive reward model accuracy on language, image, and heterogeneous datasets.

02

The approach effectively captures diverse preferences and improves few-shot generalization.

03

Current preference datasets may oversimplify preferences, highlighting the need for nuanced data collection.

Abstract

Large foundation models pretrained on raw web-scale data are not readily deployable without additional step of extensive alignment to human preferences. Such alignment is typically done by collecting large amounts of pairwise comparisons from humans ("Do you prefer output A or B?") and learning a reward model or a policy with the Bradley-Terry-Luce (BTL) model as a proxy for a human's underlying implicit preferences. These methods generally suffer from assuming a universal preference shared by all humans, which lacks the flexibility of adapting to plurality of opinions and preferences. In this work, we propose PAL, a framework to model human preference complementary to existing pretraining strategies, which incorporates plurality from the ground up. We propose using the ideal point model as a lens to view alignment using preference comparisons. Together with our novel reformulation and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RamyaLab/pluralistic-alignment
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Bayesian Modeling and Causal Inference · Logic, Reasoning, and Knowledge