Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models

Dung Anh Hoang; Cuong Pham anh Trung Le; Jianfei Cai; Thanh-Toan Do

arXiv:2602.01289·cs.LG·March 3, 2026

Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models

Dung Anh Hoang, Cuong Pham anh Trung Le, Jianfei Cai, Thanh-Toan Do

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a gradient-aligned post-training quantization method for diffusion models that learns to assign optimal weights to calibration samples, improving quantization performance and efficiency.

Contribution

It proposes a novel PTQ approach that aligns gradients across timesteps by learning sample weights, addressing limitations of uniform calibration in diffusion models.

Findings

01

Outperforms existing PTQ methods on CIFAR-10, LSUN-Bedrooms, and ImageNet.

02

Reduces inference time and memory usage while maintaining model accuracy.

03

Effectively aligns gradients across timesteps for better quantization quality.

Abstract

Diffusion models have shown remarkable performance in image synthesis by progressively estimating a smooth transition from a Gaussian distribution of noise to a real image. Unfortunately, their practical deployment is limited by slow inference speed, high memory usage, and the computational demands of the noise estimation process. Post-training quantization (PTQ) emerges as a promising solution to accelerate sampling and reduce memory overhead for diffusion models. Existing PTQ methods for diffusion models typically apply uniform weights to calibration samples across timesteps, which is sub-optimal since data at different timesteps may contribute differently to the diffusion process. Additionally, due to varying activation distributions and gradients across timesteps, a uniform quantization approach is sub-optimal. Each timestep requires a different gradient direction for optimal…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

- The paper identifies a key limitation in existing PTQ methods: uniform treatment of all timesteps. It then proposes a principled weighting mechanism that learns optimal calibration weights, effectively aligning gradients across timesteps. - The proposed method is evaluated on multiple benchmark datasets (CIFAR-10, LSUN-Bedrooms, and ImageNet), consistently outperforming prior PTQ approaches.

Weaknesses

- As of 2025, most state-of-the-art diffusion models are built upon the DiT architecture. However, this submission does not include experiments on such models, which limits the generalizability and relevance of the findings to current diffusion frameworks. - The experimental evaluation is primarily conducted on small-scale datasets (e.g., CIFAR) with low-resolution images (e.g., 32×32). While these settings are useful for preliminary validation, they do not sufficiently demonstrate the scalabil

Reviewer 02Rating 6Confidence 2

Strengths

- Improvement of FID and sFID is verified by experiment. - Provides a theoretical justification for the proxy objective with approximation, and reports optimization trends consistent with the theory.

Weaknesses

- Limited preliminaries on the specific techniques used (e.g., AdaRound) and related design choices. - Evaluation metrics lean heavily on FID, leaving diversity aspects less explored in the main tables.

Reviewer 03Rating 8Confidence 4

Strengths

1. The proposed motivation is very interesting and important. It starts from the numerical angle, looking into the grad dis-alignment when training quantization for different de-noising steps, which I believe opens up an important area to explore. 2. The proposed method is well-designed. Both intuitions and theoretical proofs are provided, making it very clear to me. 3. The proposed method achieves significant performance boost on the quantization task, and theoretical analysis proves that such

Weaknesses

1. Minor issue: All citations are not in the correct format, which makes reading sometimes hard. My recommendation is: the authors should check them in the next version. 2. The visualizations could be further refined to make it more impressive: While Fig. 1(a) presents the interesting grad conflict phenomenon very clearly, Fig. 2 looks not as straight-forward as Fig. 1. I recommend the authors to re-make Fig. 2 in the form of Fig. 1, to make it more impressive and more comparable.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques · Medical Image Segmentation Techniques