PermuQuant: Lowering Per-Group Quantization Error by Reordering Channels for Diffusion Models

Yongsen Cheng; Kai Liu; Kaiwen Tao; Junxian Li; Zhixin Wang; Zhikai Chen; Renjing Pei; Yulun Zhang

arXiv:2605.09503·cs.CV·May 12, 2026

PermuQuant: Lowering Per-Group Quantization Error by Reordering Channels for Diffusion Models

Yongsen Cheng, Kai Liu, Kaiwen Tao, Junxian Li, Zhixin Wang, Zhikai Chen, Renjing Pei, Yulun Zhang

PDF

1 Repo

TL;DR

PermuQuant introduces a channel reordering technique based on statistical similarity to improve post-training quantization of diffusion models, significantly reducing error and memory footprint.

Contribution

It proposes a novel channel sorting method for per-group quantization that adapts based on calibration data, enhancing low-bit model performance.

Findings

01

Reduces quantization error across multiple diffusion models.

02

Achieves up to 1.8× speedup and 3.5× memory reduction.

03

Outperforms existing PTQ methods in quality and efficiency.

Abstract

Large-scale visual generative models have achieved remarkable performance. However, their high computational and memory costs make deployment challenging in resource-constrained scenarios, such as interactive applications and personal single-GPU usage. Post-training quantization (PTQ) offers a practical solution by compressing pretrained models without expensive retraining. However, existing PTQ methods still suffer from severe quality degradation under extremely low-bit settings. In this paper, we identify channel ordering as an important but underexplored factor in per-group quantization. In this setting, each contiguous group shares one quantization scale. When channels with very different statistics are placed in the same group, the scale can be dominated by outliers and cause large quantization errors. Based on this observation, we propose PermuQuant, a simple and effective PTQ…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yscheng04/PermuQuant
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.