Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

Jiwoo Hong; Sayak Paul; Noah Lee; Kashif Rasul; James Thorne; Jongheon Jeong

arXiv:2406.06424·cs.CV·December 4, 2025

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

Jiwoo Hong, Sayak Paul, Noah Lee, Kashif Rasul, James Thorne, Jongheon Jeong

PDF

Open Access 4 Models 3 Datasets 1 Video

TL;DR

This paper introduces MaPO, a reference-agnostic preference optimization method that improves alignment of diffusion models across diverse tasks by addressing the limitations of reference-based methods like DPO.

Contribution

MaPO is a novel, reference-free approach that directly optimizes preference likelihood margins, enabling more effective and versatile diffusion model alignment without reference mismatch issues.

Findings

01

MaPO outperforms DPO and DreamBooth across five domains.

02

MaPO reduces training time by 15%.

03

MaPO is more robust to reference mismatch severity.

Abstract

Modern preference alignment methods, such as DPO, rely on divergence regularization to a reference model for training stability-but this creates a fundamental problem we call "reference mismatch." In this paper, we investigate the negative impacts of reference mismatch in aligning text-to-image (T2I) diffusion models, showing that larger reference mismatch hinders effective adaptation given the same amount of data, e.g., as when learning new artistic styles, or personalizing to specific objects. We demonstrate this phenomenon across text-to-image (T2I) diffusion models and introduce margin-aware preference optimization (MaPO), a reference-agnostic approach that breaks free from this constraint. By directly optimizing the likelihood margin between preferred and dispreferred outputs under the Bradley-Terry model without anchoring to a reference, MaPO transforms diverse T2I tasks into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

Margin-Aware Preference Optimization for Aligning Diffusion Models Without Reference· underline

Taxonomy

TopicsUrban and Freight Transport Logistics

MethodsDirect Preference Optimization · Balanced Selection · Focus · Diffusion