SIPO: Stabilized and Improved Preference Optimization for Aligning Diffusion Models

Xiaomeng Yang; Mengping Yang; Junyan Wang; Zhijian Zhou; Zhiyu Tan; Hao Li

arXiv:2505.21893·cs.LG·May 19, 2026

SIPO: Stabilized and Improved Preference Optimization for Aligning Diffusion Models

Xiaomeng Yang, Mengping Yang, Junyan Wang, Zhijian Zhou, Zhiyu Tan, Hao Li

PDF

TL;DR

SIPO introduces a stabilized and improved preference optimization framework that enhances the alignment of diffusion models with human preferences by addressing training instability and off-policy bias.

Contribution

The paper proposes SIPO, a novel framework with gradient clipping and timestep-aware importance reweighting, to improve stability and effectiveness in preference-based diffusion model alignment.

Findings

01

SIPO stabilizes training across various diffusion models.

02

Outperforms existing alignment methods on multiple benchmarks.

03

Provides guidelines for timestep-aware preference optimization.

Abstract

Preference learning has garnered extensive attention as an effective technique for aligning diffusion models with human preferences in visual generation. However, existing alignment approaches such as Diffusion-DPO suffer from two fundamental challenges: training instability caused by high gradient variances at various timesteps and high parameter sensitivities, and off-policy bias arising from the discrepancy between the optimization data and the policy models' distribution. Our first contribution is a systematic analysis of diffusion trajectories across different timesteps, identifying that the instability primarily originates from early timesteps with low importance weights. To address these issues, we propose \textbf{SIPO}, a \textbf{S}tabilized and \textbf{I}mproved \textbf{P}reference \textbf{O}ptimization framework for aligning diffusion models with human preferences. Concretely,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.