DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers
Sayeh Sharify, Mahsa Salmani, Hesham Mostafa

TL;DR
DiRotQ introduces a rotation-aware quantization method for 4-bit diffusion transformers, significantly improving efficiency and maintaining high image quality through PCA-based activation compression and optimized inference kernels.
Contribution
The paper proposes DiRotQ, a novel PCA-based rotation-aware quantization framework for diffusion transformers, enhancing 4-bit inference quality and efficiency.
Findings
Achieves lower FID and higher PSNR compared to prior methods.
Reduces memory usage by 2.1x and speeds up inference by 2.3x.
Introduces a new holistic evaluation protocol for quantized diffusion models.
Abstract
Diffusion Transformers (DiTs) achieve state-of-the-art image generation quality but incur substantial memory and computational costs at inference. While aggressive Post-Training Quantization (PTQ) to 4-bit precision offers significant efficiency gains, it typically results in severe quality degradation. Existing approaches, including smoothing-based methods, mixed-precision schemes, rotation techniques, and low-rank residual methods, partially mitigate this issue but still leave a noticeable gap to FP16/BF16 performance. In this work, we introduce DiRotQ, a W4A4 PTQ framework that mitigates this degradation through rotation-aware activation quantization. DiRotQ identifies a low-rank subspace capturing dominant activation variance via Principal Component Analysis (PCA), preserving coefficients in this subspace at higher precision while quantizing the remaining components to 4-bit.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
