Low-Bitwidth Floating Point Quantization for Efficient High-Quality Diffusion Models
Cheng Chen, Christina Giannoula, Andreas Moshovos

TL;DR
This paper introduces a floating-point quantization approach for diffusion models that enhances image quality and reduces degradation compared to traditional integer quantization, especially at low bitwidths.
Contribution
It proposes a novel floating-point quantization method tailored for diffusion models, demonstrating superior image quality over integer quantization at similar bitwidths.
Findings
Floating-point quantization yields higher-quality images than integer methods.
8-bit floating-point quantization maintains near full-precision quality.
Minimal degradation observed with 4-bit weights and 8-bit activations.
Abstract
Diffusion models are emerging models that generate images by iteratively denoising random Gaussian noise using deep neural networks. These models typically exhibit high computational and memory demands, necessitating effective post-training quantization for high-performance inference. Recent works propose low-bitwidth (e.g., 8-bit or 4-bit) quantization for diffusion models, however 4-bit integer quantization typically results in low-quality images. We observe that on several widely used hardware platforms, there is little or no difference in compute capability between floating-point and integer arithmetic operations of the same bitwidth (e.g., 8-bit or 4-bit). Therefore, we propose an effective floating-point quantization method for diffusion models that provides better image quality compared to integer quantization methods. We employ a floating-point quantization method that was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Image and Signal Denoising Methods · Advanced Adaptive Filtering Techniques
MethodsDiffusion
