Training-free Mixed-Resolution Latent Upsampling for Spatially Accelerated Diffusion Transformers
Wongi Jeong, Kyungryeol Lee, Hoigi Seo, Se Young Chun

TL;DR
This paper introduces RALU, a training-free, spatial acceleration framework for diffusion transformers that reduces computational costs by mixed-resolution latent upsampling, achieving significant speedups with minimal quality loss.
Contribution
The paper proposes a novel training-free spatial acceleration method, RALU, that mitigates artifacts in latent upsampling for diffusion transformers, enabling efficient high-speed generation.
Findings
Achieves up to 7.0× speedup on FLUX-1.dev
Achieves up to 3.0× speedup on Stable Diffusion 3
Up to 15.9× combined speedup with existing methods
Abstract
Diffusion transformers (DiTs) offer excellent scalability for high-fidelity generation, but their computational overhead poses a great challenge for practical deployment. Existing acceleration methods primarily exploit the temporal dimension, whereas spatial acceleration remains underexplored. In this work, we investigate spatial acceleration for DiTs via latent upsampling. We found that na\"ive latent upsampling for spatial acceleration introduces artifacts, primarily due to aliasing in high-frequency edge regions and mismatching from noise-timestep discrepancies. Then, based on these findings and analyses, we propose a training-free spatial acceleration framework, dubbed Region-Adaptive Latent Upsampling (RALU), to mitigate those artifacts while achieving spatial acceleration of DiTs by our mixed-resolution latent upsampling. RALU achieves artifact-free, efficient acceleration with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
