Qua$^2$SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models
Keith G. Mills, Mohammad Salameh, Ruichen Chen, Negar Hassanpour, Wei, Lu, Di Niu

TL;DR
This paper introduces Qua$^2$SeDiMo, a framework for analyzing and optimizing quantization sensitivity in diffusion models, enabling high-quality mixed-precision quantization across various architectures.
Contribution
It provides explainable insights into quantization effects on different model layers, guiding effective mixed-precision quantization for diffusion models including U-Nets and Transformers.
Findings
Achieved 3.4-3.9 bit weight quantization on various diffusion models.
Outperformed existing methods in quantitative metrics and image quality.
Provided a systematic approach for quantization sensitivity analysis.
Abstract
Diffusion Models (DM) have democratized AI image generation through an iterative denoising process. Quantization is a major technique to alleviate the inference cost and reduce the size of DM denoiser networks. However, as denoisers evolve from variants of convolutional U-Nets toward newer Transformer architectures, it is of growing importance to understand the quantization sensitivity of different weight layers, operations and architecture types to performance. In this work, we address this challenge with QuaSeDiMo, a mixed-precision Post-Training Quantization framework that generates explainable insights on the cost-effectiveness of various model weight quantization methods for different denoiser operation types and block structures. We leverage these insights to make high-quality mixed-precision quantization decisions for a myriad of diffusion models ranging from foundational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMagnetic and transport properties of perovskites and related materials · Rare-earth and actinide compounds · Spectral Theory in Mathematical Physics
MethodsAttention Is All You Need · Linear Layer · Dropout · Diffusion · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection
