Understanding Likelihood Over-optimisation in Direct Alignment   Algorithms

Zhengyan Shi; Sander Land; Acyr Locatelli; Matthieu Geist; Max Bartolo

arXiv:2410.11677·cs.CL·October 21, 2024

Understanding Likelihood Over-optimisation in Direct Alignment Algorithms

Zhengyan Shi, Sander Land, Acyr Locatelli, Matthieu Geist, Max Bartolo

PDF

Open Access

TL;DR

This paper investigates how likelihood over-optimisation in direct alignment algorithms can harm language model performance, revealing that balancing likelihood and diversity is crucial for better alignment with human preferences.

Contribution

The study uncovers the relationship between likelihood over-optimisation and performance in DAAs, proposing indicators to prevent over-optimisation and enhance alignment.

Findings

01

Higher likelihood correlates with memorisation of factual knowledge.

02

Lower likelihood can improve output diversity and generalisation.

03

Decreasing Entropy and Diminishing Top-k Probability are signs of over-optimisation.

Abstract

Direct Alignment Algorithms (DAAs), such as Direct Preference Optimisation (DPO) and Identity Preference Optimisation (IPO), have emerged as alternatives to online Reinforcement Learning from Human Feedback (RLHF) algorithms such as Proximal Policy Optimisation (PPO) for aligning language models to human preferences, without the need for explicit reward modelling. These methods generally aim to increase the likelihood of generating better (preferred) completions while discouraging worse (non-preferred) ones, while staying close to the original model's behaviour. In this work, we explore the relationship between completion likelihood and model performance in state-of-the-art DAAs, and identify a critical issue of likelihood over-optimisation. Contrary to expectations, we find that higher likelihood of better completions and larger margins between better and worse completion likelihoods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Packing Problems