Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking
Jie Ren, Yuhang Zhang, Dongrui Liu, Xiaopeng Zhang, Qi Tian

TL;DR
This paper identifies issues in existing preference alignment methods for diffusion models and proposes TailorPO, a new framework that directly ranks intermediate samples and incorporates gradient guidance, leading to improved human-preferred image generation.
Contribution
The paper introduces TailorPO, a novel preference optimization framework that addresses inherent issues in previous methods by ranking intermediate samples and integrating gradient guidance.
Findings
Significantly improves human-preferred image quality.
Effectively resolves gradient direction issues.
Enhances alignment with human preferences.
Abstract
Direct preference optimization (DPO) has shown success in aligning diffusion models with human preference. Previous approaches typically assume a consistent preference label between final generations and noisy samples at intermediate steps, and directly apply DPO to these noisy samples for fine-tuning. However, we theoretically identify inherent issues in this assumption and its impacts on the effectiveness of preference alignment. We first demonstrate the inherent issues from two perspectives: gradient direction and preference order, and then propose a Tailored Preference Optimization (TailorPO) framework for aligning diffusion models with human preference, underpinned by some theoretical insights. Our approach directly ranks intermediate noisy samples based on their step-wise reward, and effectively resolves the gradient direction issues through a simple yet efficient design.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTransportation Planning and Optimization · Data Management and Algorithms · Multi-Criteria Decision Making
MethodsDirect Preference Optimization · Diffusion
