Beyond the Binary: Capturing Diverse Preferences With Reward   Regularization

Vishakh Padmakumar; Chuanyang Jin; Hannah Rose Kirk; He He

arXiv:2412.03822·cs.CL·December 6, 2024

Beyond the Binary: Capturing Diverse Preferences With Reward Regularization

Vishakh Padmakumar, Chuanyang Jin, Hannah Rose Kirk, He He

PDF

Open Access

TL;DR

This paper highlights the limitations of binary preference judgments in capturing diverse user preferences for LLMs and proposes a reward regularization method using synthetic preferences to better align models with real-world user diversity.

Contribution

It introduces a taxonomy of preference subjectivity and a regularization technique that incorporates synthetic preferences to improve reward models.

Findings

01

Reward models weakly correlate with user preferences in subjective cases.

02

Synthetic preference augmentation improves alignment with user preferences.

03

Regularization with synthetic preferences enhances reward model performance.

Abstract

Large language models (LLMs) are increasingly deployed via public-facing interfaces to interact with millions of users, each with diverse preferences. Despite this, preference tuning of LLMs predominantly relies on reward models trained using binary judgments where annotators select the preferred choice out of pairs of model outputs. In this work, we argue that this reliance on binary choices does not capture the broader, aggregate preferences of the target user in real-world tasks. We propose a taxonomy that identifies two dimensions of subjectivity where different users disagree on the preferred output-namely, the Plurality of Responses to Prompts, where prompts allow for multiple correct answers, and the Indistinguishability of Responses, where candidate outputs are paraphrases of each other. We show that reward models correlate weakly with user preferences in these cases. As a first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDecision-Making and Behavioral Economics · Game Theory and Voting Systems

MethodsALIGN