TL;DR
This paper introduces Q-DiT4SR, a novel post-training quantization framework specifically designed for diffusion transformer-based real-world image super-resolution, achieving state-of-the-art results with significant model size and computation reduction.
Contribution
It presents the first PTQ method tailored for DiT-based super-resolution, including H-SVD for low-rank approximation and VaSMP/VaTMP for precision allocation and scheduling.
Findings
Achieves state-of-the-art performance on multiple datasets.
Reduces model size by 5.8 times and computations by 6.14 times under W4A4 setting.
Demonstrates effective local texture preservation in quantized models.
Abstract
Recently, Diffusion Transformers (DiTs) have emerged in Real-World Image Super-Resolution (Real-ISR) to generate high-quality textures, yet their heavy inference burden hinders real-world deployment. While Post-Training Quantization (PTQ) is a promising solution for acceleration, existing methods in super-resolution mostly focus on U-Net architectures, whereas generic DiT quantization is typically designed for text-to-image tasks. Directly applying these methods to DiT-based super-resolution models leads to severe degradation of local textures. Therefore, we propose Q-DiT4SR, the first PTQ framework specifically tailored for DiT-based Real-ISR. We propose H-SVD, a hierarchical SVD that integrates a global low-rank branch with a local block-wise rank-1 branch under a matched parameter budget. We further propose Variance-aware Spatio-Temporal Mixed Precision: VaSMP allocates cross-layer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
