DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation

Xuewen Liu; Zhikai Li; Minhao Jiang; Mengjuan Chen; Jianquan Li; Qingyi Gu

arXiv:2409.14307·cs.CV·July 10, 2025

DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation

Xuewen Liu, Zhikai Li, Minhao Jiang, Mengjuan Chen, Jianquan Li, Qingyi Gu

PDF

Open Access 3 Reviews

TL;DR

DilateQuant introduces a novel quantization-aware training framework for diffusion models, utilizing weight dilation and other techniques to improve accuracy and efficiency in low-bit quantization scenarios.

Contribution

The paper proposes Weight Dilation, Temporal Parallel Quantizer, and Block-wise Knowledge Distillation to enhance diffusion model quantization, addressing activation range and training efficiency challenges.

Findings

01

Outperforms existing methods in accuracy.

02

Reduces resource consumption during training.

03

Ensures stable convergence in low-bit quantization.

Abstract

Model quantization is a promising method for accelerating and compressing diffusion models. Nevertheless, since post-training quantization (PTQ) fails catastrophically at low-bit cases, quantization-aware training (QAT) is essential. Unfortunately, the wide range and time-varying activations in diffusion models sharply increase the complexity of quantization, making existing QAT methods inefficient. Equivalent scaling can effectively reduce activation range, but previous methods remain the overall quantization error unchanged. More critically, these methods significantly disrupt the original weight distribution, resulting in poor weight initialization and challenging convergence during QAT training. In this paper, we propose a novel QAT framework for diffusion models, called DilateQuant. Specifically, we propose Weight Dilation (WD) that maximally dilates the unsaturated in-channel…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 4

Strengths

The authors have conducted extensive experiments on diffusion model tasks. We can see the proposed methods performs consistently well as compared to SOTA baselines. Besides, they conducted ablation studies to check the impact of each component and do efficiency analysis.

Weaknesses

1) Ideas of scaling from activations to weights have been proposed in other papers. 2) Abalation studies is not convincing, details can be seen in Questions section. 3) I am doubting whether the module of BKD can be applied to large-scale diffusion models without training. I will improve my score if the reviewers can address my concern.

Reviewer 02Rating 6Confidence 4

Strengths

- The idea of the proposed Weight Dilation is novel and interesting. - The paper is well-written and easy to understand.

Weaknesses

- The contributions of TPQ and BKD appear to be modest, especially for BKD, which is well explored in the previous quantization and neural architecture search literature. - According to the ablation study (Table. 3), most of the improvement owes to knowledge distillation (BKD). Adding the WD to the BKD only improves the FID from 9.63 to 9.13. So is the key component actually the BKD? - The paper only validates the method on Unet-based models, however, given that DiT-based diffusion models (e.g.,

Reviewer 03Rating 3Confidence 4

Strengths

The most innovative point of this paper is the proposed Weight Dilation (WD) technology, which maximally expands the weight in the unsaturated channel to the restricted range by equivalent scaling mathematically. WD can absorb activation quantization error into weight quantization with no extra cost. The activation range is reduced, making it easy to quantify the activation. The weight range remains the same, which makes it easy for the model to converge during the training phase. DilateQuant is

Weaknesses

1. It can be seen from Table 3 that the biggest innovation in this paper, WD, has little improvement on performance. 2. The full text lacks innovation. For example, TPQ and BKD in this paper are not innovative. 3. Setting the scale factor for the out-channel of weights is not a new method, which has been widely used in the channel-wise quantization. 4. The paper lacks the effect of image generation at higher resolution (i.e., 1024x1024).

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques

MethodsDiffusion · Attentive Walk-Aggregating Graph Neural Network · Knowledge Distillation · ALIGN