A density estimation perspective on learning from pairwise human   preferences

Vincent Dumoulin; Daniel D. Johnson; Pablo Samuel Castro; Hugo; Larochelle; Yann Dauphin

arXiv:2311.14115·cs.LG·January 11, 2024·1 cites

A density estimation perspective on learning from pairwise human preferences

Vincent Dumoulin, Daniel D. Johnson, Pablo Samuel Castro, Hugo, Larochelle, Yann Dauphin

PDF

Open Access 1 Repo

TL;DR

This paper reinterprets learning from human preferences as a density estimation problem, providing theoretical insights and empirical evidence on how reward models capture implicit preference distributions and highlighting challenges with annotator diversity.

Contribution

It introduces a density estimation perspective on learning from pairwise preferences, offering new theoretical understanding and empirical validation of reward modeling.

Findings

01

Training reward functions models implicit preference distributions

02

Preference modeling can fail under annotator misspecification

03

Density estimation perspective clarifies learning dynamics in LHF

Abstract

Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted to maximize the rewards, often under additional regularization constraints. We propose an alternative interpretation which centers on the generative process for pairwise preferences and treats LHF as a density estimation problem. We provide theoretical and empirical results showing that for a family of generative processes defined via preference behavior distribution equations, training a reward function on pairwise preferences effectively models an annotator's implicit preference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-deepmind/pbde
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Natural Language Processing Techniques