Q-Diffusion: Quantizing Diffusion Models
Xiuyu Li, Yijiang Liu, Long Lian, Huanrui Yang, Zhen Dong, Daniel, Kang, Shanghang Zhang, Kurt Keutzer

TL;DR
Q-Diffusion introduces a novel post-training quantization method tailored for diffusion models, enabling 4-bit compression with minimal performance loss and faster inference, thus enhancing efficiency for image synthesis tasks.
Contribution
The paper presents a new PTQ technique specifically designed for diffusion models, addressing their unique multi-timestep architecture and activation distributions.
Findings
Quantizes diffusion models to 4-bit with minimal FID increase
Maintains high generation quality in text-guided image synthesis
Achieves training-free quantization with significant efficiency gains
Abstract
Diffusion models have achieved great success in image synthesis through iterative noise estimation using deep neural networks. However, the slow inference, high memory consumption, and computation intensity of the noise estimation model hinder the efficient adoption of diffusion models. Although post-training quantization (PTQ) is considered a go-to compression method for other tasks, it does not work out-of-the-box on diffusion models. We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture of the diffusion models, which compresses the noise estimation network to accelerate the generation process. We identify the key difficulty of diffusion model quantization as the changing output distributions of noise estimation networks over multiple time steps and the bimodal activation distribution of the shortcut layers within the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · AI in cancer detection
MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
