DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers

Sayeh Sharify; Mahsa Salmani; Hesham Mostafa

arXiv:2605.16732·cs.CV·May 19, 2026

DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers

Sayeh Sharify, Mahsa Salmani, Hesham Mostafa

PDF

TL;DR

DiRotQ introduces a rotation-aware quantization method for 4-bit diffusion transformers, significantly improving efficiency and maintaining high image quality through PCA-based activation compression and optimized inference kernels.

Contribution

The paper proposes DiRotQ, a novel PCA-based rotation-aware quantization framework for diffusion transformers, enhancing 4-bit inference quality and efficiency.

Findings

01

Achieves lower FID and higher PSNR compared to prior methods.

02

Reduces memory usage by 2.1x and speeds up inference by 2.3x.

03

Introduces a new holistic evaluation protocol for quantized diffusion models.

Abstract

Diffusion Transformers (DiTs) achieve state-of-the-art image generation quality but incur substantial memory and computational costs at inference. While aggressive Post-Training Quantization (PTQ) to 4-bit precision offers significant efficiency gains, it typically results in severe quality degradation. Existing approaches, including smoothing-based methods, mixed-precision schemes, rotation techniques, and low-rank residual methods, partially mitigate this issue but still leave a noticeable gap to FP16/BF16 performance. In this work, we introduce DiRotQ, a W4A4 PTQ framework that mitigates this degradation through rotation-aware activation quantization. DiRotQ identifies a low-rank subspace capturing dominant activation variance via Principal Component Analysis (PCA), preserving coefficients in this subspace at higher precision while quantizing the remaining components to 4-bit.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.