LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration
Peiliang Cai, Jiacheng Liu, Haowen Xu, Xinyu Wang, Chang Zou, Linfeng Zhang

TL;DR
LESA introduces a learnable, stage-aware predictor framework with multi-stage, multi-expert architecture for diffusion model acceleration, achieving significant speedups with minimal quality loss across various image and video generation tasks.
Contribution
The paper proposes a novel learnable, stage-aware predictor framework with a multi-stage, multi-expert design for diffusion model acceleration, improving speed and quality.
Findings
Achieves 5.00x acceleration on FLUX.1-dev with 1.0% quality drop.
Attains 6.25x speedup on Qwen-Image with 20.2% quality improvement.
Realizes 5.00x acceleration on HunyuanVideo with 24.7% PSNR gain.
Abstract
Diffusion models have achieved remarkable success in image and video generation tasks. However, the high computational demands of Diffusion Transformers (DiTs) pose a significant challenge to their practical deployment. While feature caching is a promising acceleration strategy, existing methods based on simple reusing or training-free forecasting struggle to adapt to the complex, stage-dependent dynamics of the diffusion process, often resulting in quality degradation and failing to maintain consistency with the standard denoising process. To address this, we propose a LEarnable Stage-Aware (LESA) predictor framework based on two-stage training. Our approach leverages a Kolmogorov-Arnold Network (KAN) to accurately learn temporal feature mappings from data. We further introduce a multi-stage, multi-expert architecture that assigns specialized predictors to different noise-level stages,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Image Enhancement Techniques
