MPQ-Diff: Mixed Precision Quantization for Diffusion Models
Rocco Manz Maruzzelli, Basile Lewandowski, Lydia Y. Chen

TL;DR
This paper introduces MPQ-Diff, a mixed precision quantization method for diffusion models that dynamically allocates bit-widths to layers based on their importance, significantly improving image generation quality and sampling efficiency.
Contribution
The paper proposes a novel mixed precision quantization scheme for diffusion models that uses network orthogonality to determine layer importance, reducing profiling overhead and enhancing performance.
Findings
Significant FID score improvements on LSUN and ImageNet datasets.
Effective allocation of different bit-widths to layers improves sampling speed and quality.
Demonstrates the benefits of mixed precision quantization over fixed precision in diffusion models.
Abstract
Diffusion models (DMs) generate remarkable high quality images via the stochastic denoising process, which unfortunately incurs high sampling time. Post-quantizing the trained diffusion models in fixed bit-widths, e.g., 4 bits on weights and 8 bits on activation, is shown effective in accelerating sampling time while maintaining the image quality. Motivated by the observation that the cross-layer dependency of DMs vary across layers and sampling steps, we propose a mixed precision quantization scheme, MPQ-Diff, which allocates different bit-width to the weights and activation of the layers. We advocate to use the cross-layer correlation of a given layer, termed network orthogonality metric, as a proxy to measure the relative importance of a layer per sampling step. We further adopt a uniform sampling scheme to avoid the excessive profiling overhead of estimating orthogonality across all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Signal Denoising Methods · Advanced Data Compression Techniques
