Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers

Shikang Zheng; Liang Feng; Xinyu Wang; Qinming Zhou; Peiliang Cai; Chang Zou; Jiacheng Liu; Yuqi Lin; Junjie Chen; Yue Ma; Linfeng Zhang

arXiv:2508.16211·cs.CV·August 25, 2025

Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers

Shikang Zheng, Liang Feng, Xinyu Wang, Qinming Zhou, Peiliang Cai, Chang Zou, Jiacheng Liu, Yuqi Lin, Junjie Chen, Yue Ma, Linfeng Zhang

PDF

1 Video

TL;DR

This paper introduces FoCa, a novel ODE-based approach for feature caching in diffusion transformers, significantly improving inference speed while maintaining high generation quality across various tasks.

Contribution

It models feature caching as a feature-ODE solving problem, enabling robust acceleration of diffusion transformers without additional training.

Findings

01

Achieves near-lossless speedups of 5.50x on FLUX and 6.45x on HunyuanVideo.

02

Maintains high quality with a 4.53x speedup on DiT.

03

Demonstrates effectiveness across image synthesis, video generation, and super-resolution tasks.

Abstract

Diffusion Transformers (DiTs) have demonstrated exceptional performance in high-fidelity image and video generation. To reduce their substantial computational costs, feature caching techniques have been proposed to accelerate inference by reusing hidden representations from previous timesteps. However, current methods often struggle to maintain generation quality at high acceleration ratios, where prediction errors increase sharply due to the inherent instability of long-step forecasting. In this work, we adopt an ordinary differential equation (ODE) perspective on the hidden-feature sequence, modeling layer representations along the trajectory as a feature-ODE. We attribute the degradation of existing caching strategies to their inability to robustly integrate historical features under large skipping intervals. To address this, we propose FoCa (Forecast-then-Calibrate), which treats…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Forecast Then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers· underline