Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback
Yi-Lun Wu, Bo-Kai Ruan, Chiang Tseng, Hong-Han Shuai

TL;DR
This paper introduces Diffusion-DRO, a novel preference learning framework for diffusion models that leverages ranking and inverse reinforcement learning to better align generated images with human preferences, overcoming previous estimation challenges.
Contribution
Diffusion-DRO removes the need for reward models by framing preference learning as a ranking problem, integrating offline and online data, and simplifying training for improved alignment.
Findings
Outperforms state-of-the-art baselines in quality metrics
Effectively captures human preferences with offline and online data
Improves generation quality on unseen prompts
Abstract
Direct preference optimization (DPO) methods have shown strong potential in aligning text-to-image diffusion models with human preferences by training on paired comparisons. These methods improve training stability by avoiding the REINFORCE algorithm but still struggle with challenges such as accurately estimating image probabilities due to the non-linear nature of the sigmoid function and the limited diversity of offline datasets. In this paper, we introduce Diffusion Denoising Ranking Optimization (Diffusion-DRO), a new preference learning framework grounded in inverse reinforcement learning. Diffusion-DRO removes the dependency on a reward model by casting preference learning as a ranking problem, thereby simplifying the training objective into a denoising formulation and overcoming the non-linear estimation issues found in prior methods. Moreover, Diffusion-DRO uniquely integrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Multi-Objective Optimization Algorithms · Multimodal Machine Learning Applications
