D-Fusion: Direct Preference Optimization for Aligning Diffusion Models with Visually Consistent Samples
Zijing Hu, Fengda Zhang, Kun Kuang

TL;DR
This paper introduces D-Fusion, a novel method that creates visually consistent samples to improve diffusion models' alignment with text prompts, addressing visual inconsistency issues in preference optimization.
Contribution
D-Fusion constructs visually consistent samples for better alignment, combining mask-guided self-attention fusion with retention of denoising trajectories for enhanced DPO training.
Findings
D-Fusion improves prompt-image alignment across various reinforcement learning algorithms.
The method produces images that are both well-aligned and visually consistent.
Experimental results validate the effectiveness of D-Fusion in alignment tasks.
Abstract
The practical applications of diffusion models have been limited by the misalignment between generated images and corresponding text prompts. Recent studies have introduced direct preference optimization (DPO) to enhance the alignment of these models. However, the effectiveness of DPO is constrained by the issue of visual inconsistency, where the significant visual disparity between well-aligned and poorly-aligned images prevents diffusion models from identifying which factors contribute positively to alignment during fine-tuning. To address this issue, this paper introduces D-Fusion, a method to construct DPO-trainable visually consistent samples. On one hand, by performing mask-guided self-attention fusion, the resulting images are not only well-aligned, but also visually consistent with given poorly-aligned images. On the other hand, D-Fusion can retain the denoising trajectories of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Generative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques
MethodsDirect Preference Optimization · Diffusion
