Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback

Yi-Lun Wu; Bo-Kai Ruan; Chiang Tseng; Hong-Han Shuai

arXiv:2510.18353·cs.CV·October 22, 2025

Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback

Yi-Lun Wu, Bo-Kai Ruan, Chiang Tseng, Hong-Han Shuai

PDF

Open Access 1 Models 1 Video

TL;DR

This paper introduces Diffusion-DRO, a novel preference learning framework for diffusion models that leverages ranking and inverse reinforcement learning to better align generated images with human preferences, overcoming previous estimation challenges.

Contribution

Diffusion-DRO removes the need for reward models by framing preference learning as a ranking problem, integrating offline and online data, and simplifying training for improved alignment.

Findings

01

Outperforms state-of-the-art baselines in quality metrics

02

Effectively captures human preferences with offline and online data

03

Improves generation quality on unseen prompts

Abstract

Direct preference optimization (DPO) methods have shown strong potential in aligning text-to-image diffusion models with human preferences by training on paired comparisons. These methods improve training stability by avoiding the REINFORCE algorithm but still struggle with challenges such as accurately estimating image probabilities due to the non-linear nature of the sigmoid function and the limited diversity of offline datasets. In this paper, we introduce Diffusion Denoising Ranking Optimization (Diffusion-DRO), a new preference learning framework grounded in inverse reinforcement learning. Diffusion-DRO removes the dependency on a reward model by casting preference learning as a ranking problem, thereby simplifying the training objective into a denoising formulation and overcoming the non-linear estimation issues found in prior methods. Moreover, Diffusion-DRO uniquely integrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
ylwu/diffusion-dro-sd1.5
model· ♡ 1
♡ 1

Videos

Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback· slideslive

Taxonomy

TopicsRecommender Systems and Techniques · Advanced Multi-Objective Optimization Algorithms · Multimodal Machine Learning Applications