Understanding Likelihood Over-optimisation in Direct Alignment Algorithms
Zhengyan Shi, Sander Land, Acyr Locatelli, Matthieu Geist, Max Bartolo

TL;DR
This paper investigates how likelihood over-optimisation in direct alignment algorithms can harm language model performance, revealing that balancing likelihood and diversity is crucial for better alignment with human preferences.
Contribution
The study uncovers the relationship between likelihood over-optimisation and performance in DAAs, proposing indicators to prevent over-optimisation and enhance alignment.
Findings
Higher likelihood correlates with memorisation of factual knowledge.
Lower likelihood can improve output diversity and generalisation.
Decreasing Entropy and Diminishing Top-k Probability are signs of over-optimisation.
Abstract
Direct Alignment Algorithms (DAAs), such as Direct Preference Optimisation (DPO) and Identity Preference Optimisation (IPO), have emerged as alternatives to online Reinforcement Learning from Human Feedback (RLHF) algorithms such as Proximal Policy Optimisation (PPO) for aligning language models to human preferences, without the need for explicit reward modelling. These methods generally aim to increase the likelihood of generating better (preferred) completions while discouraging worse (non-preferred) ones, while staying close to the original model's behaviour. In this work, we explore the relationship between completion likelihood and model performance in state-of-the-art DAAs, and identify a critical issue of likelihood over-optimisation. Contrary to expectations, we find that higher likelihood of better completions and larger margins between better and worse completion likelihoods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Packing Problems
