PQD: Post-training Quantization for Efficient Diffusion Models
Jiaojiao Ye, Zhen Wang, and Linnan Jiang

TL;DR
This paper introduces PQD, a post-training quantization method that significantly reduces the computational complexity of diffusion models, enabling efficient 8-bit or 4-bit inference without retraining, while maintaining high-quality image generation.
Contribution
The paper presents a novel time-aware post-training quantization framework for diffusion models, allowing direct quantization into low-bit formats with minimal performance loss.
Findings
Quantizes diffusion models to 8-bit and 4-bit with minimal FID change
Maintains high-quality image synthesis without retraining
Applicable to high-resolution text-guided image generation
Abstract
Diffusionmodels(DMs)havedemonstratedremarkableachievements in synthesizing images of high fidelity and diversity. However, the extensive computational requirements and slow generative speed of diffusion models have limited their widespread adoption. In this paper, we propose a novel post-training quantization for diffusion models (PQD), which is a time-aware optimization framework for diffusion models based on post-training quantization. The proposed framework optimizes the inference process by selecting representative samples and conducting time-aware calibration. Experimental results show that our proposed method is able to directly quantize full-precision diffusion models into 8-bit or 4-bit models while maintaining comparable performance in a training-free manner, achieving a few FID change on ImageNet for unconditional image generation. Our approach demonstrates compatibility and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion
