Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment
Teng Xiao, Yige Yuan, Huaisheng Zhu, Mingxiao Li, Vasant G Honavar

TL;DR
Cal-DPO introduces a calibration technique for implicit rewards in preference optimization, significantly enhancing language model alignment with human preferences by ensuring reward scale comparability.
Contribution
The paper proposes Cal-DPO, a calibration-based method that improves language model alignment by addressing the scale mismatch in implicit reward optimization.
Findings
Cal-DPO outperforms existing methods on standard benchmarks.
Calibrating implicit rewards improves alignment quality.
Theoretical analysis supports Cal-DPO's advantages.
Abstract
We study the problem of aligning large language models (LLMs) with human preference data. Contrastive preference optimization has shown promising results in aligning LLMs with available preference data by optimizing the implicit reward associated with the policy. However, the contrastive objective focuses mainly on the relative values of implicit rewards associated with two responses while ignoring their actual values, resulting in suboptimal alignment with human preferences. To address this limitation, we propose calibrated direct preference optimization (Cal-DPO), a simple yet effective algorithm. We show that substantial improvement in alignment with the given preferences can be achieved simply by calibrating the implicit reward to ensure that the learned implicit rewards are comparable in scale to the ground-truth rewards. We demonstrate the theoretical advantages of Cal-DPO over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
