LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration

Peiliang Cai; Jiacheng Liu; Haowen Xu; Xinyu Wang; Chang Zou; Linfeng Zhang

arXiv:2602.20497·cs.CV·March 17, 2026

LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration

Peiliang Cai, Jiacheng Liu, Haowen Xu, Xinyu Wang, Chang Zou, Linfeng Zhang

PDF

Open Access

TL;DR

LESA introduces a learnable, stage-aware predictor framework with multi-stage, multi-expert architecture for diffusion model acceleration, achieving significant speedups with minimal quality loss across various image and video generation tasks.

Contribution

The paper proposes a novel learnable, stage-aware predictor framework with a multi-stage, multi-expert design for diffusion model acceleration, improving speed and quality.

Findings

01

Achieves 5.00x acceleration on FLUX.1-dev with 1.0% quality drop.

02

Attains 6.25x speedup on Qwen-Image with 20.2% quality improvement.

03

Realizes 5.00x acceleration on HunyuanVideo with 24.7% PSNR gain.

Abstract

Diffusion models have achieved remarkable success in image and video generation tasks. However, the high computational demands of Diffusion Transformers (DiTs) pose a significant challenge to their practical deployment. While feature caching is a promising acceleration strategy, existing methods based on simple reusing or training-free forecasting struggle to adapt to the complex, stage-dependent dynamics of the diffusion process, often resulting in quality degradation and failing to maintain consistency with the standard denoising process. To address this, we propose a LEarnable Stage-Aware (LESA) predictor framework based on two-stage training. Our approach leverages a Kolmogorov-Arnold Network (KAN) to accurately learn temporal feature mappings from data. We further introduce a multi-stage, multi-expert architecture that assigns specialized predictors to different noise-level stages,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Image Enhancement Techniques