Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas

Austin Silveria; Soham V. Govande; Daniel Y. Fu

arXiv:2506.03275·cs.CV·June 5, 2025

Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas

Austin Silveria, Soham V. Govande, Daniel Y. Fu

PDF

Open Access

TL;DR

Chipmunk introduces a training-free method to accelerate diffusion transformers by dynamically computing only the most changing activations, leveraging sparsity and GPU optimizations to significantly reduce inference time without quality loss.

Contribution

This work presents Chipmunk, a novel inference-time acceleration technique for diffusion transformers using dynamic column-sparse deltas, without requiring additional training.

Findings

01

Achieves up to 3.72x speedup on diffusion models.

02

Maintains high generation quality despite acceleration.

03

Efficient GPU kernels enable practical deployment.

Abstract

Diffusion Transformers (DiTs) have achieved state-of-the-art performance in high-quality image and video generation but incur substantial compute cost at inference. A common observation is that DiT latent noise vectors change slowly across inference steps, which suggests that the DiT compute may be redundant across steps. In this paper, we aim to speed up inference by reducing this redundancy, without additional training. We first study how activations change between steps in two state-of-the-art open-source DiTs. We find that just 5-25% of the values in attention and MLP explain 70-90% of the change in activations across steps. This finding motivates our approach, Chipmunk, which uses dynamic sparsity at inference time to recompute only the fastest-changing intermediate activations, while caching the rest. Dynamic sparsity introduces two systems challenges: (1) sparse attention and MLP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Advanced Memory and Neural Computing