Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers
Wenhao Sun, Ji Li, Zhaoqiang Liu

TL;DR
This paper introduces JiT, a training-free spatial acceleration framework for diffusion transformers, significantly speeding up image synthesis with minimal loss in quality by focusing on spatial redundancy.
Contribution
JiT is a novel, training-free method that accelerates diffusion transformers by approximating spatial computations with sparse anchor tokens and a deterministic micro-flow.
Findings
Achieves up to 7x speedup in image synthesis.
Maintains nearly lossless performance.
Outperforms existing acceleration methods.
Abstract
Diffusion Transformers have established a new state-of-the-art in image synthesis, but the high computational cost of iterative sampling severely hampers their practical deployment. While existing acceleration methods often focus on the temporal domain, they overlook the substantial spatial redundancy inherent in the generative process, where global structures emerge long before fine-grained details are formed. The uniform computational treatment of all spatial regions represents a critical inefficiency. In this paper, we introduce Just-in-Time (JiT), a novel training-free framework that addresses this challenge by acceleration in the spatial domain. JiT formulates a spatially approximated generative ordinary differential equation (ODE) that drives the full latent state evolution based on computations from a dynamically selected, sparse subset of anchor tokens. To ensure seamless…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Music Technology and Sound Studies
