HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization
Wenxuan Liu, Sai Qian Zhang

TL;DR
This paper introduces HQ-DiT, a novel 4-bit floating-point quantization method for Diffusion Transformers, significantly reducing model size and computation with minimal performance loss, enabling deployment on resource-constrained devices.
Contribution
The paper presents the first 4-bit FP quantization approach for both weights and activations in Diffusion Transformers, improving efficiency while maintaining performance.
Findings
Achieves 4-bit quantization with only 0.12 increase in sFID on ImageNet.
Utilizes a novel clipping range selection mechanism for minimal quantization error.
Introduces a universal identity transform to reduce outlier-induced errors.
Abstract
Diffusion Transformers (DiTs) have recently gained substantial attention in both industrial and academic fields for their superior visual generation capabilities, outperforming traditional diffusion models that use U-Net. However,the enhanced performance of DiTs also comes with high parameter counts and implementation costs, seriously restricting their use on resource-limited devices such as mobile phones. To address these challenges, we introduce the Hybrid Floating-point Quantization for DiT(HQ-DiT), an efficient post-training quantization method that utilizes 4-bit floating-point (FP) precision on both weights and activations for DiT inference. Compared to fixed-point quantization (e.g., INT8), FP quantization, complemented by our proposed clipping range selection mechanism, naturally aligns with the data distribution within DiT, resulting in a minimal quantization error.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhotonic and Optical Devices · Advanced Fiber Optic Sensors · Optical Systems and Laser Technology
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Max Pooling · Convolution · U-Net · Diffusion
