Aligning Language Models with Human Preferences via a Bayesian Approach

Jiashuo Wang; Haozhao Wang; Shichao Sun; Wenjie Li

arXiv:2310.05782·cs.CL·January 17, 2024·1 cites

Aligning Language Models with Human Preferences via a Bayesian Approach

Jiashuo Wang, Haozhao Wang, Shichao Sun, Wenjie Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Bayesian approach to better align language models with human preferences by modeling preference disagreements, improving performance over existing methods in human-centric NLG tasks.

Contribution

It proposes a novel Bayesian framework (d-PM) to capture human preference disagreements and uses contrastive learning for efficient NLG training, surpassing prior methods.

Findings

01

Outperforms previous SOTA models in automatic evaluations

02

Achieves higher human satisfaction scores

03

Demonstrates robustness across multiple NLG tasks

Abstract

In the quest to advance human-centric natural language generation (NLG) systems, ensuring alignment between NLG models and human preferences is crucial. For this alignment, current popular methods leverage a reinforcement learning (RL) approach with a reward model trained on feedback from humans. However, inherent disagreements due to the subjective nature of human preferences pose a significant challenge for training the reward model, resulting in a deterioration of the NLG performance. To tackle this issue, previous approaches typically rely on majority voting or averaging to consolidate multiple inconsistent preferences into a merged one. Although straightforward to understand and execute, such methods suffer from an inability to capture the nuanced degrees of disaggregation among humans and may only represent a specialized subset of individuals, thereby lacking the ability to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wangjs9/aligned-dpm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques

MethodsContrastive Learning