QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Junchi Yan, Yan Yan

TL;DR
This paper introduces QuEST, a method for low-bit diffusion model quantization that uses selective finetuning to improve efficiency and performance, addressing high memory and computational costs.
Contribution
The paper proposes a novel selective finetuning approach to improve low-bit quantization of diffusion models, focusing on critical layers and activation distribution adjustments.
Findings
Achieves state-of-the-art results in high-resolution image generation tasks.
Effectively mitigates performance loss through layer-specific finetuning.
Demonstrates robustness across multiple bit-width settings.
Abstract
The practical deployment of diffusion models is still hindered by the high memory and computational overhead. Although quantization paves a way for model compression and acceleration, existing methods face challenges in achieving low-bit quantization efficiently. In this paper, we identify imbalanced activation distributions as a primary source of quantization difficulty, and propose to adjust these distributions through weight finetuning to be more quantization-friendly. We provide both theoretical and empirical evidence supporting finetuning as a practical and reliable solution. Building on this approach, we further distinguish two critical types of quantized layers: those responsible for retaining essential temporal information and those particularly sensitive to bit-width reduction. By selectively finetuning these layers under both local and global supervision, we mitigate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Neural Networks and Reservoir Computing
MethodsDiffusion
