Forecasting When to Forecast: Accelerating Diffusion Models with Confidence-Gated Taylor
Xiaoliu Guan, Lielin Jiang, Hanqi Chen, Xu Zhang, Jiaxing Yan, Guanzhong Wang, Yi Liu, Zetao Zhang, Yu Wu

TL;DR
This paper introduces a dynamic Taylor-based acceleration method for diffusion transformers that selectively trades off inference speed and output quality, achieving significant speedups with minimal quality loss.
Contribution
It shifts Taylor prediction to the last block level and uses error-based dynamic caching to improve speed and reliability in diffusion model inference.
Findings
Achieves 3.17x acceleration on FLUX
Achieves 2.36x acceleration on DiT
Achieves 4.14x acceleration on Wan Video
Abstract
Diffusion Transformers (DiTs) have demonstrated remarkable performance in visual generation tasks. However, their low inference speed limits their deployment in low-resource applications. Recent training-free approaches exploit the redundancy of features across timesteps by caching and reusing past representations to accelerate inference. Building on this idea, TaylorSeer instead uses cached features to predict future ones via Taylor expansion. However, its module-level prediction across all transformer blocks (e.g., attention or feedforward modules) requires storing fine-grained intermediate features, leading to notable memory and computation overhead. Moreover, it adopts a fixed caching schedule without considering the varying accuracy of predictions across timesteps, which can lead to degraded outputs when prediction fails. To address these limitations, we propose a novel approach to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
