What Matters in Data for DPO?

Yu Pan; Zhongze Cai; Guanting Chen; Huaiyang Zhong; Chonghuan Wang

arXiv:2508.18312·cs.LG·November 10, 2025

What Matters in Data for DPO?

Yu Pan, Zhongze Cai, Guanting Chen, Huaiyang Zhong, Chonghuan Wang

PDF

TL;DR

This paper systematically investigates how the distribution and quality of preference data affect the performance of Direct Preference Optimization (DPO) in aligning large language models, highlighting the importance of chosen responses.

Contribution

It provides a theoretical and empirical analysis of preference data characteristics, emphasizing the dominant role of chosen response quality in DPO effectiveness.

Findings

01

Quality of chosen responses significantly impacts DPO performance

02

Contrastiveness between responses enhances the effectiveness of DPO

03

Mixing on-policy data can improve alignment outcomes

Abstract

Direct Preference Optimization (DPO) has emerged as a simple and effective approach for aligning large language models (LLMs) with human preferences, bypassing the need for a learned reward model. Despite its growing adoption, a fundamental question remains open: what characteristics of preference data are most critical for DPO performance? In this work, we provide a systematic study of how preference data distribution influences DPO, from both theoretical and empirical perspectives. We show that the quality of chosen responses plays a dominant role in optimizing the DPO objective, while the quality of rejected responses may have relatively limited impact. Our theoretical analysis characterizes the optimal response distribution under DPO and reveals how contrastiveness between responses helps primarily by improving the chosen samples. We further study an online DPO setting and show it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.