Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences

Yunhong Lu; Qichao Wang; Hengyuan Cao; Xiaoyin Xu; Min Zhang

arXiv:2506.02698·cs.CV·June 9, 2025

Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences

Yunhong Lu, Qichao Wang, Hengyuan Cao, Xiaoyin Xu, Min Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces SmPO-Diffusion, a novel method for aligning diffusion models with human preferences by modeling preference distributions, leading to improved performance and reduced training costs.

Contribution

It proposes a smoothed preference distribution and an inversion technique to better align diffusion models with human preferences, addressing issues of objective misalignment.

Findings

01

Achieves state-of-the-art preference evaluation performance

02

Outperforms baseline methods across multiple metrics

03

Reduces training costs significantly

Abstract

Direct Preference Optimization (DPO) aligns text-to-image (T2I) generation models with human preferences using pairwise preference data. Although substantial resources are expended in collecting and labeling datasets, a critical aspect is often neglected: \textit{preferences vary across individuals and should be represented with more granularity.} To address this, we propose SmPO-Diffusion, a novel method for modeling preference distributions to improve the DPO objective, along with a numerical upper bound estimation for the diffusion optimization objective. First, we introduce a smoothed preference distribution to replace the original binary distribution. We employ a reward model to simulate human preferences and apply preference likelihood averaging to improve the DPO loss, such that the loss function approaches zero when preferences are similar. Furthermore, we utilize an inversion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences· slideslive

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · Mobile Crowdsensing and Crowdsourcing · Multimodal Machine Learning Applications

MethodsDirect Preference Optimization · Diffusion