TQ-DiT: Efficient Time-Aware Quantization for Diffusion Transformers
Younghye Hwang, Hyojin Lee, Joonhyuk Kang

TL;DR
This paper introduces TQ-DiT, a novel quantization method for diffusion transformers that significantly reduces computational complexity while maintaining high performance, enabling more efficient real-time AI applications.
Contribution
The paper proposes multi-region and time-grouping quantization techniques specifically designed for diffusion transformers, improving efficiency with minimal performance loss.
Findings
Achieves comparable performance to full-precision models with W8A8 quantization.
Outperforms baseline methods at W6A6 quantization.
Demonstrates potential for real-time generative AI applications.
Abstract
Diffusion transformers (DiTs) combine transformer architectures with diffusion models. However, their computational complexity imposes significant limitations on real-time applications and sustainability of AI systems. In this study, we aim to enhance the computational efficiency through model quantization, which represents the weights and activation values with lower precision. Multi-region quantization (MRQ) is introduced to address the asymmetric distribution of network values in DiT blocks by allocating two scaling parameters to sub-regions. Additionally, time-grouping quantization (TGQ) is proposed to reduce quantization error caused by temporal variation in activations. The experimental results show that the proposed algorithm achieves performance comparable to the original full-precision model with only a 0.29 increase in FID at W8A8. Furthermore, it outperforms other baselines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Neural Networks and Reservoir Computing · Photonic and Optical Devices
MethodsDiffusion
