Filtered Direct Preference Optimization

Tetsuro Morimura; Mitsuki Sakamoto; Yuu Jinnai; Kenshi Abe; Kaito Ariu

arXiv:2404.13846·cs.LG·December 4, 2024

Filtered Direct Preference Optimization

Tetsuro Morimura, Mitsuki Sakamoto, Yuu Jinnai, Kenshi Abe, Kaito Ariu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates the impact of text quality in preference datasets for RLHF, introduces filtered DPO (fDPO) which discards low-quality samples during training, and demonstrates improved model performance.

Contribution

It proposes fDPO, an extension of DPO that filters out low-quality data using a reward model, enhancing RLHF effectiveness.

Findings

01

fDPO improves model performance over standard DPO.

02

Text quality significantly affects RLHF outcomes.

03

Filtering data during training leads to more accurate preference models.

Abstract

Reinforcement learning from human feedback (RLHF) plays a crucial role in aligning language models with human preferences. While the significance of dataset quality is generally recognized, explicit investigations into its impact within the RLHF framework, to our knowledge, have been limited. This paper addresses the issue of text quality within the preference dataset by focusing on direct preference optimization (DPO), an increasingly adopted reward-model-free RLHF method. We confirm that text quality significantly influences the performance of models optimized with DPO more than those optimized with reward-model-based RLHF. Building on this new insight, we propose an extension of DPO, termed filtered direct preference optimization (fDPO). fDPO uses a trained reward model to monitor the quality of texts within the preference dataset during DPO training. Samples of lower quality are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cyberagentailab/filtered-dpo
pytorchOfficial

Videos

Filtered Direct Preference Optimization· underline

Taxonomy

TopicsConstraint Satisfaction and Optimization

MethodsDirect Preference Optimization