Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
Audrey Huang, Wenhao Zhan, Tengyang Xie, Jason D. Lee, Wen Sun, Akshay, Krishnamurthy, Dylan J. Foster

TL;DR
This paper introduces $ ext{\chi}^2$-Preference Optimization, a simple yet theoretically robust offline alignment method that effectively prevents overoptimization in language models by using $ ext{\chi}^2$-divergence instead of KL-regularization.
Contribution
It proposes a minimal modification to DPO that implements a $ ext{\chi}^2$-divergence based regularization, providing the first provably robust offline alignment algorithm against overoptimization.
Findings
$ ext{\chi}^2$-Preference Optimization alleviates overoptimization.
It offers sample-complexity guarantees based on single-policy concentrability.
The method is simple, practical, and theoretically sound.
Abstract
Language model alignment methods such as reinforcement learning from human feedback (RLHF) have led to impressive advances in language model capabilities, but are limited by a widely observed phenomenon known as overoptimization, where the quality of the language model degrades over the course of the alignment process. As the model optimizes performance with respect to an offline reward model, it overfits to inaccuracies and drifts away from preferred responses covered by the data. To discourage such distribution shift, KL-regularization is widely employed in existing offline alignment methods, but overoptimization continues to harm performance. Lending theoretical insight into the source of these empirical observations, we first show that the KL-regularization is too weak to prevent overfitting, then raise the following question: is it possible to design an efficient algorithm that is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Topology Optimization in Engineering · Neural Networks and Applications
MethodsDirect Preference Optimization
