Adaptive Hybrid Caching for Efficient Text-to-Video Diffusion Model Acceleration
Yuanxin Wei, Lansong Diao, Bujiao Chen, Shenggan Cheng, Zhengping Qian, Wenyuan Yu, Nong Xiao, Wei Lin, Jiangsu Du

TL;DR
This paper introduces MixCache, a flexible, training-free caching framework that adaptively optimizes caching granularity to accelerate video diffusion models without compromising quality.
Contribution
It presents a novel adaptive hybrid caching strategy that dynamically balances quality and speed in video diffusion model inference, surpassing existing single-granularity methods.
Findings
Achieves up to 1.97x speedup in video generation
Maintains high generation quality with improved efficiency
Demonstrates effectiveness across diverse models
Abstract
Efficient video generation models are increasingly vital for multimedia synthetic content generation. Leveraging the Transformer architecture and the diffusion process, video DiT models have emerged as a dominant approach for high-quality video generation. However, their multi-step iterative denoising process incurs high computational cost and inference latency. Caching, a widely adopted optimization method in DiT models, leverages the redundancy in the diffusion process to skip computations in different granularities (e.g., step, cfg, block). Nevertheless, existing caching methods are limited to single-granularity strategies, struggling to balance generation quality and inference speed in a flexible manner. In this work, we propose MixCache, a training-free caching-based framework for efficient video DiT inference. It first distinguishes the interference and boundary between different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Image and Signal Denoising Methods · Video Coding and Compression Technologies
