Qua$^2$SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models

Keith G. Mills; Mohammad Salameh; Ruichen Chen; Negar Hassanpour; Wei; Lu; Di Niu

arXiv:2412.14628·cs.CV·December 20, 2024

Qua$^2$SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models

Keith G. Mills, Mohammad Salameh, Ruichen Chen, Negar Hassanpour, Wei, Lu, Di Niu

PDF

Open Access 1 Datasets

TL;DR

This paper introduces Qua$^2$SeDiMo, a framework for analyzing and optimizing quantization sensitivity in diffusion models, enabling high-quality mixed-precision quantization across various architectures.

Contribution

It provides explainable insights into quantization effects on different model layers, guiding effective mixed-precision quantization for diffusion models including U-Nets and Transformers.

Findings

01

Achieved 3.4-3.9 bit weight quantization on various diffusion models.

02

Outperformed existing methods in quantitative metrics and image quality.

03

Provided a systematic approach for quantization sensitivity analysis.

Abstract

Diffusion Models (DM) have democratized AI image generation through an iterative denoising process. Quantization is a major technique to alleviate the inference cost and reduce the size of DM denoiser networks. However, as denoisers evolve from variants of convolutional U-Nets toward newer Transformer architectures, it is of growing importance to understand the quantization sensitivity of different weight layers, operations and architecture types to performance. In this work, we address this challenge with Qua $^{2}$ SeDiMo, a mixed-precision Post-Training Quantization framework that generates explainable insights on the cost-effectiveness of various model weight quantization methods for different denoiser operation types and block structures. We leverage these insights to make high-quality mixed-precision quantization decisions for a myriad of diffusion models ranging from foundational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

kgmills/Qua2SeDiMo
dataset· 93 dl
93 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMagnetic and transport properties of perovskites and related materials · Rare-earth and actinide compounds · Spectral Theory in Mathematical Physics

MethodsAttention Is All You Need · Linear Layer · Dropout · Diffusion · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection