Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct   Preference Optimization

Zhanhui Zhou; Jie Liu; Jing Shao; Xiangyu Yue; Chao Yang; Wanli; Ouyang; Yu Qiao

arXiv:2310.03708·cs.LG·August 20, 2024·2 cites

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

Zhanhui Zhou, Jie Liu, Jing Shao, Xiangyu Yue, Chao Yang, Wanli, Ouyang, Yu Qiao

PDF

Open Access 2 Repos 2 Models 1 Video

TL;DR

MODPO introduces a stable, efficient RL-free method for multi-objective language model alignment, outperforming traditional RLHF approaches in safety and long-form QA while reducing computational costs.

Contribution

It extends Direct Preference Optimization to handle multiple objectives without reinforcement learning, enabling more stable and resource-efficient multi-preference alignment.

Findings

01

MODPO matches or exceeds existing methods in safety and QA tasks.

02

It produces a Pareto front of models for diverse preferences.

03

Requires three times less computational resources than MORLHF.

Abstract

A single language model, even when aligned with labelers through reinforcement learning from human feedback (RLHF), may not suit all human preferences. Recent approaches therefore prefer customization, gathering multi-dimensional feedback, and creating distinct reward models for each dimension. Different language models are then optimized for various preferences using multi-objective RLHF (MORLHF) with varying reward weights. However, RL fine-tuning is unstable and resource-heavy, especially with diverse and usually conflicting objectives. In this paper, we present Multi-Objective Direct Preference Optimization (MODPO), an RL-free extension of Direct Preference Optimization (DPO) for multiple alignment objectives. Essentially, MODPO folds language modeling directly into reward modeling, training language models as implicit collective reward models that combine all objectives with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization· underline

Taxonomy

TopicsSoftware Engineering Research · Speech and dialogue systems · Software Engineering Techniques and Practices

MethodsOPT