SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity
Zichen Fan, Steve Dai, Rangharajan Venkatesan, Dennis Sylvester,, Brucek Khailany

TL;DR
This paper introduces a novel approach to accelerate diffusion models by combining aggressive quantization, temporal sparsity, and a specialized hardware architecture, resulting in significant speed and energy efficiency improvements.
Contribution
It proposes a new diffusion model accelerator with mixed-precision dense-sparse architecture and a time-step-aware sparsity detector, enabling faster and more energy-efficient image generation.
Findings
Achieves 6.91x speed-up over traditional dense accelerators.
Demonstrates superior quality with 4-bit quantization.
Reduces energy consumption by 51.5%.
Abstract
Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it requires running model inference over many time steps. To accelerate these models, we propose to aggressively quantize both weights and activations, while simultaneously promoting significant activation sparsity. We further observe that the stated sparsity pattern varies among different channels and evolves across time steps. To support this quantization and sparsity scheme, we present a novel diffusion model accelerator featuring a heterogeneous mixed-precision dense-sparse architecture, channel-last address mapping, and a time-step-aware sparsity detector for efficient handling of the sparsity pattern. Our 4-bit quantization technique demonstrates superior generation quality compared to existing 4-bit methods. Our custom accelerator…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpinion Dynamics and Social Influence
MethodsDiffusion
