Refining Alignment Framework for Diffusion Models with Intermediate-Step   Preference Ranking

Jie Ren; Yuhang Zhang; Dongrui Liu; Xiaopeng Zhang; Qi Tian

arXiv:2502.01667·cs.LG·February 5, 2025

Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking

Jie Ren, Yuhang Zhang, Dongrui Liu, Xiaopeng Zhang, Qi Tian

PDF

Open Access

TL;DR

This paper identifies issues in existing preference alignment methods for diffusion models and proposes TailorPO, a new framework that directly ranks intermediate samples and incorporates gradient guidance, leading to improved human-preferred image generation.

Contribution

The paper introduces TailorPO, a novel preference optimization framework that addresses inherent issues in previous methods by ranking intermediate samples and integrating gradient guidance.

Findings

01

Significantly improves human-preferred image quality.

02

Effectively resolves gradient direction issues.

03

Enhances alignment with human preferences.

Abstract

Direct preference optimization (DPO) has shown success in aligning diffusion models with human preference. Previous approaches typically assume a consistent preference label between final generations and noisy samples at intermediate steps, and directly apply DPO to these noisy samples for fine-tuning. However, we theoretically identify inherent issues in this assumption and its impacts on the effectiveness of preference alignment. We first demonstrate the inherent issues from two perspectives: gradient direction and preference order, and then propose a Tailored Preference Optimization (TailorPO) framework for aligning diffusion models with human preference, underpinned by some theoretical insights. Our approach directly ranks intermediate noisy samples based on their step-wise reward, and effectively resolves the gradient direction issues through a simple yet efficient design.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTransportation Planning and Optimization · Data Management and Algorithms · Multi-Criteria Decision Making

MethodsDirect Preference Optimization · Diffusion