Cal-DPO: Calibrated Direct Preference Optimization for Language Model   Alignment

Teng Xiao; Yige Yuan; Huaisheng Zhu; Mingxiao Li; Vasant G Honavar

arXiv:2412.14516·cs.LG·December 20, 2024

Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment

Teng Xiao, Yige Yuan, Huaisheng Zhu, Mingxiao Li, Vasant G Honavar

PDF

Open Access 1 Repo

TL;DR

Cal-DPO introduces a calibration technique for implicit rewards in preference optimization, significantly enhancing language model alignment with human preferences by ensuring reward scale comparability.

Contribution

The paper proposes Cal-DPO, a calibration-based method that improves language model alignment by addressing the scale mismatch in implicit reward optimization.

Findings

01

Cal-DPO outperforms existing methods on standard benchmarks.

02

Calibrating implicit rewards improves alignment quality.

03

Theoretical analysis supports Cal-DPO's advantages.

Abstract

We study the problem of aligning large language models (LLMs) with human preference data. Contrastive preference optimization has shown promising results in aligning LLMs with available preference data by optimizing the implicit reward associated with the policy. However, the contrastive objective focuses mainly on the relative values of implicit rewards associated with two responses while ignoring their actual values, resulting in suboptimal alignment with human preferences. To address this limitation, we propose calibrated direct preference optimization (Cal-DPO), a simple yet effective algorithm. We show that substantial improvement in alignment with the given preferences can be achieved simply by calibrating the implicit reward to ensure that the learned implicit rewards are comparable in scale to the ground-truth rewards. We demonstrate the theoretical advantages of Cal-DPO over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tengxiao1/cal-dpo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis