New Desiderata for Direct Preference Optimization

Xiangkun Hu; Tong He; David Wipf

arXiv:2407.09072·cs.CL·July 15, 2024

New Desiderata for Direct Preference Optimization

Xiangkun Hu, Tong He, David Wipf

PDF

Open Access

TL;DR

This paper critically evaluates existing direct preference optimization (DPO) methods for aligning large language models with human preferences, identifies their limitations, and proposes an improved DPO-like loss with empirical validation.

Contribution

It introduces new evaluation criteria for DPO, highlights key shortcomings, and proposes a novel DPO-like loss that addresses these issues.

Findings

01

Existing DPO methods struggle with interpolation between models and preferences.

02

Trade-offs exist in regularization and constraint handling in current DPO approaches.

03

The proposed DPO-like loss mitigates identified limitations, improving alignment quality.

Abstract

Large language models in the past have typically relied on some form of reinforcement learning with human feedback (RLHF) to better align model responses with human preferences. However, because of oft-observed instabilities when implementing these RLHF pipelines, various reparameterization techniques have recently been introduced to sidestep the need for separately learning an RL reward model. Instead, directly fine-tuning for human preferences is achieved via the minimization of a single closed-form training objective, a process originally referred to as direct preference optimization (DPO) and followed by several notable descendants. Although effective in certain real-world settings, we introduce new evaluation criteria that serve to highlight unresolved shortcomings in the ability of existing DPO methods to interpolate between a pre-trained reference model and empirical measures of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Machine Learning and Data Classification · Emotion and Mood Recognition

MethodsDirect Preference Optimization · ALIGN