Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences

Weijian Luo

arXiv:2410.18881·cs.CV·June 6, 2025

Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences

Weijian Luo

PDF

Open Access 1 Repo

TL;DR

Diff-Instruct++ is a novel, fast-converging method for aligning one-step text-to-image generators with human preferences, leveraging reinforcement learning principles and theoretical insights to outperform existing models.

Contribution

The paper introduces Diff-Instruct++, the first image data-free, human preference alignment method for one-step text-to-image models, with new theoretical understanding of CFG and RLHF connections.

Findings

01

Achieves high aesthetic and human preference scores on COCO dataset

02

Outperforms existing open-source models in human preference evaluation

03

Provides theoretical insights linking CFG with RLHF in diffusion models

Abstract

One-step text-to-image generator models offer advantages such as swift inference efficiency, flexible architectures, and state-of-the-art generation performance. In this paper, we study the problem of aligning one-step generator models with human preferences for the first time. Inspired by the success of reinforcement learning using human feedback (RLHF), we formulate the alignment problem as maximizing expected human reward functions while adding an Integral Kullback-Leibler divergence term to prevent the generator from diverging. By overcoming technical challenges, we introduce Diff-Instruct++ (DI++), the first, fast-converging and image data-free human preference alignment method for one-step text-to-image generators. We also introduce novel theoretical insights, showing that using CFG for diffusion distillation is secretly doing RLHF with DI++. Such an interesting finding brings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pkulwj1994/diff_instruct_star
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing · Handwritten Text Recognition Techniques · Image Retrieval and Classification Techniques

MethodsALIGN · Diffusion