TinyFusion: Diffusion Transformers Learned Shallow
Gongfan Fang, Kunjun Li, Xinyin Ma, Xinchao Wang

TL;DR
TinyFusion introduces a learnable depth pruning method for diffusion transformers, enabling significant model compression and speedup with minimal performance loss, applicable across various architectures.
Contribution
The paper proposes a differentiable, learnable pruning technique that optimizes post-fine-tuning performance, reducing diffusion transformer depth efficiently.
Findings
Achieves less than 7% of original training cost for shallow models.
Attains 2× speedup with competitive FID scores.
Outperforms existing importance-based pruning methods.
Abstract
Diffusion Transformers have demonstrated remarkable capabilities in image generation but often come with excessive parameterization, resulting in considerable inference overhead in real-world applications. In this work, we present TinyFusion, a depth pruning method designed to remove redundant layers from diffusion transformers via end-to-end learning. The core principle of our approach is to create a pruned model with high recoverability, allowing it to regain strong performance after fine-tuning. To accomplish this, we introduce a differentiable sampling technique to make pruning learnable, paired with a co-optimized parameter to simulate future fine-tuning. While prior works focus on minimizing loss or error after pruning, our method explicitly models and optimizes the post-fine-tuning performance of pruned models. Experimental results indicate that this learnable paradigm offers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsDiffusion · Pruning · Focus
