DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity

Haowei Zhu; Ji Liu; Ziqiong Liu; Dong Li; Junhai Yong; Bin Wang; Emad Barsoum

arXiv:2604.03674·cs.CV·April 7, 2026

DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity

Haowei Zhu, Ji Liu, Ziqiong Liu, Dong Li, Junhai Yong, Bin Wang, Emad Barsoum

PDF

1 Video

TL;DR

DiffSparse introduces a learnable, differentiable sparsity optimization framework for diffusion transformers, significantly reducing computation costs while maintaining or improving image generation quality.

Contribution

The paper proposes a novel end-to-end sparsity optimization method using a learnable network and dynamic programming for diffusion transformers, enhancing efficiency without quality loss.

Findings

01

Reduces computational cost by 54% on PixArt-α with 20 steps

02

Improves efficiency across multiple diffusion-transformer models

03

Surpasses original models' generation metrics while saving computation

Abstract

Diffusion models demonstrate outstanding performance in image generation, but their multi-step inference mechanism requires immense computational cost. Previous works accelerate inference by leveraging layer or token cache techniques to reduce computational cost. However, these methods fail to achieve superior acceleration performance in few-step diffusion transformer models due to inefficient feature caching strategies, manually designed sparsity allocation, and the practice of retaining complete forward computations in several steps in these token cache methods. To tackle these challenges, we propose a differentiable layer-wise sparsity optimization framework for diffusion transformer models, leveraging token caching to reduce token computation costs and enhance acceleration. Our method optimizes layer-wise sparsity allocation in an end-to-end manner through a learnable network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity· slideslive