A density estimation perspective on learning from pairwise human preferences
Vincent Dumoulin, Daniel D. Johnson, Pablo Samuel Castro, Hugo, Larochelle, Yann Dauphin

TL;DR
This paper reinterprets learning from human preferences as a density estimation problem, providing theoretical insights and empirical evidence on how reward models capture implicit preference distributions and highlighting challenges with annotator diversity.
Contribution
It introduces a density estimation perspective on learning from pairwise preferences, offering new theoretical understanding and empirical validation of reward modeling.
Findings
Training reward functions models implicit preference distributions
Preference modeling can fail under annotator misspecification
Density estimation perspective clarifies learning dynamics in LHF
Abstract
Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted to maximize the rewards, often under additional regularization constraints. We propose an alternative interpretation which centers on the generative process for pairwise preferences and treats LHF as a density estimation problem. We provide theoretical and empirical results showing that for a family of generative processes defined via preference behavior distribution equations, training a reward function on pairwise preferences effectively models an annotator's implicit preference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Natural Language Processing Techniques
