PQD: Post-training Quantization for Efficient Diffusion Models

Jiaojiao Ye; Zhen Wang; and Linnan Jiang

arXiv:2501.00124·cs.CV·January 3, 2025

PQD: Post-training Quantization for Efficient Diffusion Models

Jiaojiao Ye, Zhen Wang, and Linnan Jiang

PDF

Open Access

TL;DR

This paper introduces PQD, a post-training quantization method that significantly reduces the computational complexity of diffusion models, enabling efficient 8-bit or 4-bit inference without retraining, while maintaining high-quality image generation.

Contribution

The paper presents a novel time-aware post-training quantization framework for diffusion models, allowing direct quantization into low-bit formats with minimal performance loss.

Findings

01

Quantizes diffusion models to 8-bit and 4-bit with minimal FID change

02

Maintains high-quality image synthesis without retraining

03

Applicable to high-resolution text-guided image generation

Abstract

Diffusionmodels(DMs)havedemonstratedremarkableachievements in synthesizing images of high fidelity and diversity. However, the extensive computational requirements and slow generative speed of diffusion models have limited their widespread adoption. In this paper, we propose a novel post-training quantization for diffusion models (PQD), which is a time-aware optimization framework for diffusion models based on post-training quantization. The proposed framework optimizes the inference process by selecting representative samples and conducting time-aware calibration. Experimental results show that our proposed method is able to directly quantize full-precision diffusion models into 8-bit or 4-bit models while maintaining comparable performance in a training-free manner, achieving a few FID change on ImageNet for unconditional image generation. Our approach demonstrates compatibility and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion