KL Penalty Control via Perturbation for Direct Preference Optimization

Sangkyu Lee; Janghoon Han; Hosung Song; Stanley Jungkyu Choi; Honglak Lee; Youngjae Yu

arXiv:2502.13177·cs.LG·October 28, 2025

KL Penalty Control via Perturbation for Direct Preference Optimization

Sangkyu Lee, Janghoon Han, Hosung Song, Stanley Jungkyu Choi, Honglak Lee, Youngjae Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces $oldsymbol{ ext{ extepsilon}- ext{DPO}}$, an adaptive method for controlling the KL penalty in Direct Preference Optimization, improving alignment of language models with human preferences by adjusting penalties per preference pair.

Contribution

The paper proposes $ ext{ extepsilon}$-DPO, a novel adaptive KL penalty control method that adjusts penalty strength dynamically for each preference pair during training.

Findings

01

$ ext{ extepsilon}$-DPO significantly improves DPO performance on chatbot benchmarks.

02

Adaptive KL penalty control reflects preference model confusion and enhances preference confidence.

03

The method provides an efficient trade-off in KL penalty, leading to better alignment results.

Abstract

Direct Preference Optimization (DPO) demonstrates the advantage of aligning a large language model with human preference using only an offline dataset. However, DPO has the limitation that the KL penalty, which prevents excessive deviation from the reference model, is static throughout the training process. Several methods claim to change this static KL penalty of DPO into a dynamic one, but no approach can adaptively assign different KL penalties for each preference pair. In this paper, we propose $ε$ -Direct Preference Optimization ( $ε$ -DPO), which allows adaptive control of the KL penalty strength $β$ for each preference pair. Specifically, $ε$ -DPO adaptively controls $β$ for each preference pair based on the monotonicity of logits as a preference model under the perturbation of $β$ during training. This is equivalent to adjusting the KL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oddqueue/e-dpo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Advanced Database Systems and Queries

MethodsDirect Preference Optimization