MPQ-DM: Mixed Precision Quantization for Extremely Low Bit Diffusion Models
Weilun Feng, Haotong Qin, Chuanguang Yang, Zhulin An, Libo Huang, Boyu, Diao, Fei Wang, Renshuai Tao, Yongjun Xu, Michele Magno

TL;DR
This paper introduces MPQ-DM, a mixed-precision quantization method for diffusion models that significantly reduces performance degradation at extremely low bit-widths by addressing outlier channels and improving stability across time steps.
Contribution
The paper proposes two novel techniques, OMQ and TRD, to enhance low-bit quantization of diffusion models, achieving substantial accuracy improvements over existing methods.
Findings
58% FID decrease under W2A4 setting compared to baseline
Outlier-Driven Mixed Quantization effectively recovers accuracy
Time-Smoothed Relation Distillation stabilizes learning across time steps
Abstract
Diffusion models have received wide attention in generation tasks. However, the expensive computation cost prevents the application of diffusion models in resource-constrained scenarios. Quantization emerges as a practical solution that significantly saves storage and computation by reducing the bit-width of parameters. However, the existing quantization methods for diffusion models still cause severe degradation in performance, especially under extremely low bit-widths (2-4 bit). The primary decrease in performance comes from the significant discretization of activation values at low bit quantization. Too few activation candidates are unfriendly for outlier significant weight channel quantization, and the discretized features prevent stable learning over different time steps of the diffusion model. This paper presents MPQ-DM, a Mixed-Precision Quantization method for Diffusion Models.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsImage and Signal Denoising Methods · Advanced Data Compression Techniques · Advanced MRI Techniques and Applications
MethodsSoftmax · Attention Is All You Need · Diffusion
