TL;DR
DiffSparse introduces a learnable, differentiable sparsity optimization framework for diffusion transformers, significantly reducing computation costs while maintaining or improving image generation quality.
Contribution
The paper proposes a novel end-to-end sparsity optimization method using a learnable network and dynamic programming for diffusion transformers, enhancing efficiency without quality loss.
Findings
Reduces computational cost by 54% on PixArt-α with 20 steps
Improves efficiency across multiple diffusion-transformer models
Surpasses original models' generation metrics while saving computation
Abstract
Diffusion models demonstrate outstanding performance in image generation, but their multi-step inference mechanism requires immense computational cost. Previous works accelerate inference by leveraging layer or token cache techniques to reduce computational cost. However, these methods fail to achieve superior acceleration performance in few-step diffusion transformer models due to inefficient feature caching strategies, manually designed sparsity allocation, and the practice of retaining complete forward computations in several steps in these token cache methods. To tackle these challenges, we propose a differentiable layer-wise sparsity optimization framework for diffusion transformer models, leveraging token caching to reduce token computation costs and enhance acceleration. Our method optimizes layer-wise sparsity allocation in an end-to-end manner through a learnable network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
