Guided Score identity Distillation for Data-Free One-Step Text-to-Image   Generation

Mingyuan Zhou; Zhendong Wang; Huangjie Zheng; Hai Huang

arXiv:2406.01561·cs.CV·February 11, 2025

Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation

Mingyuan Zhou, Zhendong Wang, Huangjie Zheng, Hai Huang

PDF

Open Access 2 Repos

TL;DR

This paper introduces a data-free distillation method for diffusion-based text-to-image models, significantly improving efficiency and performance without access to original training data, achieving state-of-the-art results.

Contribution

The authors propose a novel data-free guided distillation approach with Long and Short Classifier-Free Guidance for efficient one-step text-to-image generation.

Findings

01

Achieves a record low FID of 8.15 on COCO-2014 without real data.

02

Rapidly improves image quality and alignment with text using synthetic data.

03

Maintains competitive CLIP scores while significantly reducing generation time.

Abstract

Diffusion-based text-to-image generation models trained on extensive text-image pairs have demonstrated the ability to produce photorealistic images aligned with textual descriptions. However, a significant limitation of these models is their slow sample generation process, which requires iterative refinement through the same network. To overcome this, we introduce a data-free guided distillation method that enables the efficient distillation of pretrained Stable Diffusion models without access to the real training data, often restricted due to legal, privacy, or cost concerns. This method enhances Score identity Distillation (SiD) with Long and Short Classifier-Free Guidance (LSG), an innovative strategy that applies Classifier-Free Guidance (CFG) not only to the evaluation of the pretrained diffusion model but also to the training and evaluation of the fake score network. We optimize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization

MethodsContrastive Language-Image Pre-training · Diffusion