Training-free Mixed-Resolution Latent Upsampling for Spatially Accelerated Diffusion Transformers

Wongi Jeong; Kyungryeol Lee; Hoigi Seo; Se Young Chun

arXiv:2507.08422·cs.CV·February 26, 2026

Training-free Mixed-Resolution Latent Upsampling for Spatially Accelerated Diffusion Transformers

Wongi Jeong, Kyungryeol Lee, Hoigi Seo, Se Young Chun

PDF

TL;DR

This paper introduces RALU, a training-free, spatial acceleration framework for diffusion transformers that reduces computational costs by mixed-resolution latent upsampling, achieving significant speedups with minimal quality loss.

Contribution

The paper proposes a novel training-free spatial acceleration method, RALU, that mitigates artifacts in latent upsampling for diffusion transformers, enabling efficient high-speed generation.

Findings

01

Achieves up to 7.0× speedup on FLUX-1.dev

02

Achieves up to 3.0× speedup on Stable Diffusion 3

03

Up to 15.9× combined speedup with existing methods

Abstract

Diffusion transformers (DiTs) offer excellent scalability for high-fidelity generation, but their computational overhead poses a great challenge for practical deployment. Existing acceleration methods primarily exploit the temporal dimension, whereas spatial acceleration remains underexplored. In this work, we investigate spatial acceleration for DiTs via latent upsampling. We found that na\"ive latent upsampling for spatial acceleration introduces artifacts, primarily due to aliasing in high-frequency edge regions and mismatching from noise-timestep discrepancies. Then, based on these findings and analyses, we propose a training-free spatial acceleration framework, dubbed Region-Adaptive Latent Upsampling (RALU), to mitigate those artifacts while achieving spatial acceleration of DiTs by our mixed-resolution latent upsampling. RALU achieves artifact-free, efficient acceleration with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.