BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion

Bo-Kyeong Kim; Hyoung-Kyu Song; Thibault Castells; Shinkook Choi

arXiv:2305.15798·cs.LG·December 3, 2024·5 cites

BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion

Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells, Shinkook Choi

PDF

Open Access 3 Repos 10 Models

TL;DR

This paper introduces BK-SDM, a lightweight and efficient version of Stable Diffusion that uses block pruning and feature distillation to significantly reduce model size and inference time while maintaining competitive image generation quality.

Contribution

The authors propose a novel architectural reduction method for SDMs using block pruning and distillation, enabling low-cost training and deployment on edge devices.

Findings

01

Achieved 30-50% reduction in model size, MACs, and latency.

02

Compact models can imitate original SDMs with limited resources.

03

Enabled deployment on edge devices with 4-second inference.

Abstract

Text-to-image (T2I) generation with Stable Diffusion models (SDMs) involves high computing demands due to billion-scale parameters. To enhance efficiency, recent studies have reduced sampling steps and applied network quantization while retaining the original architectures. The lack of architectural reduction attempts may stem from worries over expensive retraining for such massive models. In this work, we uncover the surprising potential of block pruning and feature distillation for low-cost general-purpose T2I. By removing several residual and attention blocks from the U-Net of SDMs, we achieve 30%~50% reduction in model size, MACs, and latency. We show that distillation retraining is effective even under limited resources: using only 13 A100 days and a tiny dataset, our compact models can imitate the original SDMs (v1.4 and v2.1-base with over 6,000 A100 days). Benefiting from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Advanced Neuroimaging Techniques and Applications

MethodsPruning · Knowledge Distillation · Latent Diffusion Model · Diffusion · U-Net