Correcting the Mythos of KL-Regularization: Direct Alignment without   Overoptimization via Chi-Squared Preference Optimization

Audrey Huang; Wenhao Zhan; Tengyang Xie; Jason D. Lee; Wen Sun; Akshay; Krishnamurthy; Dylan J. Foster

arXiv:2407.13399·cs.AI·February 19, 2025

Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization

Audrey Huang, Wenhao Zhan, Tengyang Xie, Jason D. Lee, Wen Sun, Akshay, Krishnamurthy, Dylan J. Foster

PDF

Open Access

TL;DR

This paper introduces $ ext{\chi}^2$-Preference Optimization, a simple yet theoretically robust offline alignment method that effectively prevents overoptimization in language models by using $ ext{\chi}^2$-divergence instead of KL-regularization.

Contribution

It proposes a minimal modification to DPO that implements a $ ext{\chi}^2$-divergence based regularization, providing the first provably robust offline alignment algorithm against overoptimization.

Findings

01

$ ext{\chi}^2$-Preference Optimization alleviates overoptimization.

02

It offers sample-complexity guarantees based on single-policy concentrability.

03

The method is simple, practical, and theoretically sound.

Abstract

Language model alignment methods such as reinforcement learning from human feedback (RLHF) have led to impressive advances in language model capabilities, but are limited by a widely observed phenomenon known as overoptimization, where the quality of the language model degrades over the course of the alignment process. As the model optimizes performance with respect to an offline reward model, it overfits to inaccuracies and drifts away from preferred responses covered by the data. To discourage such distribution shift, KL-regularization is widely employed in existing offline alignment methods, but overoptimization continues to harm performance. Lending theoretical insight into the source of these empirical observations, we first show that the KL-regularization is too weak to prevent overfitting, then raise the following question: is it possible to design an efficient algorithm that is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · Topology Optimization in Engineering · Neural Networks and Applications

MethodsDirect Preference Optimization