Pairwise Calibrated Rewards for Pluralistic Alignment

Daniel Halpern; Evi Micha; Ariel D. Procaccia; Itai Shapira

arXiv:2506.06298·cs.LG·June 10, 2025

Pairwise Calibrated Rewards for Pluralistic Alignment

Daniel Halpern, Evi Micha, Ariel D. Procaccia, Itai Shapira

PDF

Open Access 1 Video

TL;DR

This paper introduces a method to model diverse human preferences in AI alignment by learning a distribution over multiple reward functions from pairwise preferences, improving calibration and representation of pluralistic values.

Contribution

It proposes a novel pairwise calibration approach to learn reward ensembles that reflect diverse human preferences without predefined groups or annotator IDs.

Findings

01

Improved calibration of reward ensembles to human preferences

02

Effective learning heuristic for training reward distributions

03

Accurate representation of pluralistic human values

Abstract

Current alignment pipelines presume a single, universal notion of desirable behavior. However, human preferences often diverge across users, contexts, and cultures. As a result, disagreement collapses into the majority signal and minority perspectives are discounted. To address this, we propose reflecting diverse human preferences through a distribution over multiple reward functions, each inducing a distinct aligned policy. The distribution is learned directly from pairwise preference without annotator identifiers or predefined groups. Instead, annotator disagreements are treated as informative soft labels. Our central criterion is pairwise calibration: for every pair of candidate responses, the proportion of reward functions preferring one response matches the fraction of annotators with that preference. We prove that even a small outlier-free ensemble can accurately represent diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Pairwise Calibrated Rewards for Pluralistic Alignment· slideslive

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Recommender Systems and Techniques · Data Quality and Management