Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas
Austin Silveria, Soham V. Govande, Daniel Y. Fu

TL;DR
Chipmunk introduces a training-free method to accelerate diffusion transformers by dynamically computing only the most changing activations, leveraging sparsity and GPU optimizations to significantly reduce inference time without quality loss.
Contribution
This work presents Chipmunk, a novel inference-time acceleration technique for diffusion transformers using dynamic column-sparse deltas, without requiring additional training.
Findings
Achieves up to 3.72x speedup on diffusion models.
Maintains high generation quality despite acceleration.
Efficient GPU kernels enable practical deployment.
Abstract
Diffusion Transformers (DiTs) have achieved state-of-the-art performance in high-quality image and video generation but incur substantial compute cost at inference. A common observation is that DiT latent noise vectors change slowly across inference steps, which suggests that the DiT compute may be redundant across steps. In this paper, we aim to speed up inference by reducing this redundancy, without additional training. We first study how activations change between steps in two state-of-the-art open-source DiTs. We find that just 5-25% of the values in attention and MLP explain 70-90% of the change in activations across steps. This finding motivates our approach, Chipmunk, which uses dynamic sparsity at inference time to recompute only the fastest-changing intermediate activations, while caching the rest. Dynamic sparsity introduces two systems challenges: (1) sparse attention and MLP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Advanced Memory and Neural Computing
