QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

Haoxuan Wang; Yuzhang Shang; Zhihang Yuan; Junyi Wu; Junchi Yan; Yan Yan

arXiv:2402.03666·cs.CV·July 16, 2025·2 cites

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Junchi Yan, Yan Yan

PDF

Open Access 1 Repo

TL;DR

This paper introduces QuEST, a method for low-bit diffusion model quantization that uses selective finetuning to improve efficiency and performance, addressing high memory and computational costs.

Contribution

The paper proposes a novel selective finetuning approach to improve low-bit quantization of diffusion models, focusing on critical layers and activation distribution adjustments.

Findings

01

Achieves state-of-the-art results in high-resolution image generation tasks.

02

Effectively mitigates performance loss through layer-specific finetuning.

03

Demonstrates robustness across multiple bit-width settings.

Abstract

The practical deployment of diffusion models is still hindered by the high memory and computational overhead. Although quantization paves a way for model compression and acceleration, existing methods face challenges in achieving low-bit quantization efficiently. In this paper, we identify imbalanced activation distributions as a primary source of quantization difficulty, and propose to adjust these distributions through weight finetuning to be more quantization-friendly. We provide both theoretical and empirical evidence supporting finetuning as a practical and reliable solution. Building on this approach, we further distinguish two critical types of quantized layers: those responsible for retaining essential temporal information and those particularly sensitive to bit-width reduction. By selectively finetuning these layers under both local and global supervision, we mitigate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hatchetProject/QuEST
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Neural Networks and Reservoir Computing

MethodsDiffusion