David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training

Weijian Luo; Colin Zhang; Debing Zhang; Zhengyang Geng

arXiv:2410.20898·cs.CV·June 6, 2025

David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training

Weijian Luo, Colin Zhang, Debing Zhang, Zhengyang Geng

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

This paper introduces Diff-Instruct*, a data-efficient post-training method for small one-step text-to-image models that outperforms large diffusion models in human preference benchmarks by using a novel score-based reinforcement learning approach.

Contribution

It proposes a novel score-based divergence regularization for RLHF, enabling effective post-training of small models to surpass large diffusion models in quality and efficiency.

Findings

01

Small 2.6B model outperforms 12B diffusion model in key benchmarks

02

Score-based regularization improves post-training stability and performance

03

Small model achieves higher scores with significantly less inference time

Abstract

We propose Diff-Instruct* (DI*), a data-efficient post-training approach for one-step text-to-image generative models to improve its human preferences without requiring image data. Our method frames alignment as online reinforcement learning from human feedback (RLHF), which optimizes the one-step model to maximize human reward functions while being regularized to be kept close to a reference diffusion process. Unlike traditional RLHF approaches, which rely on the Kullback-Leibler divergence as the regularization, we introduce a novel general score-based divergence regularization that substantially improves performance as well as post-training stability. Although the general score-based RLHF objective is intractable to optimize, we derive a strictly equivalent tractable loss function in theory that can efficiently compute its \emph{gradient} for optimizations. We introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pkulwj1994/diff_instruct_star
pytorchOfficial

Models

🤗
mrfatso/Diff-InstructStar-GGUF
model· 62 dl
62 dl

Videos

David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsContrastive Language-Image Pre-training · ALIGN · Diffusion