DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing
Zhenyuan Dong, Sai Qian Zhang

TL;DR
DiTAS introduces a data-free post-training quantization method for diffusion transformers, employing activation smoothing and weight quantization techniques to enable efficient low-bit inference with minimal performance loss.
Contribution
The paper presents a novel quantization approach for diffusion transformers that combines activation smoothing, layer-wise optimization, and a training-free weight quantization module.
Findings
Enables 4-bit weight and 8-bit activation quantization for DiTs.
Maintains comparable performance to full-precision models after quantization.
Reduces implementation costs, facilitating deployment on resource-constrained devices.
Abstract
Diffusion Transformers (DiTs) have recently attracted significant interest from both industry and academia due to their enhanced capabilities in visual generation, surpassing the performance of traditional diffusion models that employ U-Net. However, the improved performance of DiTs comes at the expense of higher parameter counts and implementation costs, which significantly limits their deployment on resource-constrained devices like mobile phones. We propose DiTAS, a data-free post-training quantization (PTQ) method for efficient DiT inference. DiTAS relies on the proposed temporal-aggregated smoothing techniques to mitigate the impact of the channel-wise outliers within the input activations, leading to much lower quantization error under extremely low bitwidth. To further enhance the performance of the quantized DiT, we adopt the layer-wise grid search strategy to optimize the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing
MethodsDiffusion
