Jackpot! Alignment as a Maximal Lottery

Roberto-Rafael Maura-Rivero; Marc Lanctot; Francesco Visin; Kate; Larson

arXiv:2501.19266·cs.AI·February 3, 2025

Jackpot! Alignment as a Maximal Lottery

Roberto-Rafael Maura-Rivero, Marc Lanctot, Francesco Visin, Kate, Larson

PDF

Open Access

TL;DR

This paper proposes using maximal lotteries, a probabilistic social choice rule, as a new approach to align large language models with human values, addressing limitations of existing reinforcement learning methods from human feedback.

Contribution

It introduces maximal lotteries as an alignment mechanism and demonstrates that existing techniques like Nash Learning from Human Feedback approximate these outcomes, improving alignment robustness.

Findings

01

Handles preference majority better than RLHF

02

Provides principled handling of non-transitivities

03

Increases robustness to irrelevant alternatives

Abstract

Reinforcement Learning from Human Feedback (RLHF), the standard for aligning Large Language Models (LLMs) with human values, is known to fail to satisfy properties that are intuitively desirable, such as respecting the preferences of the majority \cite{ge2024axioms}. To overcome these issues, we propose the use of a probabilistic Social Choice rule called \emph{maximal lotteries} as a replacement for RLHF. We show that a family of alignment techniques, namely Nash Learning from Human Feedback (NLHF) \cite{munos2023nash} and variants, approximate maximal lottery outcomes and thus inherit its beneficial properties. We confirm experimentally that our proposed methodology handles situations that arise when working with preferences more robustly than standard RLHF, including supporting the preferences of the majority, providing principled ways of handling non-transitivities in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Games and Media · Media, Gender, and Advertising · Gambling Behavior and Treatments