PTQ4DiT: Post-training Quantization for Diffusion Transformers
Junyi Wu, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, Yan Yan

TL;DR
This paper introduces PTQ4DiT, a novel post-training quantization method tailored for Diffusion Transformers, enabling efficient 8-bit and 4-bit inference with minimal performance loss.
Contribution
We propose PTQ4DiT, addressing unique quantization challenges in DiTs with novel calibration techniques and an offline re-parameterization strategy for efficient deployment.
Findings
Successfully quantizes DiTs to 8-bit with minimal quality loss
Achieves 4-bit weight quantization while maintaining generation quality
Reduces computational costs for real-time diffusion applications
Abstract
The recent introduction of Diffusion Transformers (DiTs) has demonstrated exceptional capabilities in image generation by using a different backbone architecture, departing from traditional U-Nets and embracing the scalable nature of transformers. Despite their advanced capabilities, the wide deployment of DiTs, particularly for real-time applications, is currently hampered by considerable computational demands at the inference stage. Post-training Quantization (PTQ) has emerged as a fast and data-efficient solution that can significantly reduce computation and memory footprint by using low-bit weights and activations. However, its applicability to DiTs has not yet been explored and faces non-trivial difficulties due to the unique design of DiTs. In this paper, we propose PTQ4DiT, a specifically designed PTQ method for DiTs. We discover two primary quantization challenges inherent in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Phase-change materials and chalcogenides · Magnetic properties of thin films
MethodsDiffusion
