Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences
Weijian Luo

TL;DR
Diff-Instruct++ is a novel, fast-converging method for aligning one-step text-to-image generators with human preferences, leveraging reinforcement learning principles and theoretical insights to outperform existing models.
Contribution
The paper introduces Diff-Instruct++, the first image data-free, human preference alignment method for one-step text-to-image models, with new theoretical understanding of CFG and RLHF connections.
Findings
Achieves high aesthetic and human preference scores on COCO dataset
Outperforms existing open-source models in human preference evaluation
Provides theoretical insights linking CFG with RLHF in diffusion models
Abstract
One-step text-to-image generator models offer advantages such as swift inference efficiency, flexible architectures, and state-of-the-art generation performance. In this paper, we study the problem of aligning one-step generator models with human preferences for the first time. Inspired by the success of reinforcement learning using human feedback (RLHF), we formulate the alignment problem as maximizing expected human reward functions while adding an Integral Kullback-Leibler divergence term to prevent the generator from diverging. By overcoming technical challenges, we introduce Diff-Instruct++ (DI++), the first, fast-converging and image data-free human preference alignment method for one-step text-to-image generators. We also introduce novel theoretical insights, showing that using CFG for diffusion distillation is secretly doing RLHF with DI++. Such an interesting finding brings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Handwritten Text Recognition Techniques · Image Retrieval and Classification Techniques
MethodsALIGN · Diffusion
