TaoCache: Structure-Maintained Video Generation Acceleration
Zhentao Fan, Zongzuo Wang, Weiwei Zhang

TL;DR
TaoCache is a novel, training-free caching method for video diffusion models that preserves structural integrity during acceleration, especially in late denoising stages, leading to higher visual quality.
Contribution
It introduces a fixed-point based caching strategy that improves structure preservation in accelerated video diffusion without additional training.
Findings
Significantly improves visual quality metrics over prior methods.
Effective in late denoising stages for structure preservation.
Seamlessly integrates with existing frameworks and accelerations.
Abstract
Existing cache-based acceleration methods for video diffusion models primarily skip early or mid denoising steps, which often leads to structural discrepancies relative to full-timestep generation and can hinder instruction following and character consistency. We present TaoCache, a training-free, plug-and-play caching strategy that, instead of residual-based caching, adopts a fixed-point perspective to predict the model's noise output and is specifically effective in late denoising stages. By calibrating cosine similarities and norm ratios of consecutive noise deltas, TaoCache preserves high-resolution structure while enabling aggressive skipping. The approach is orthogonal to complementary accelerations such as Pyramid Attention Broadcast (PAB) and TeaCache, and it integrates seamlessly into DiT-based frameworks. Across Latte-1, OpenSora-Plan v110, and Wan2.1, TaoCache attains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
