Adaptive Hybrid Caching for Efficient Text-to-Video Diffusion Model Acceleration

Yuanxin Wei; Lansong Diao; Bujiao Chen; Shenggan Cheng; Zhengping Qian; Wenyuan Yu; Nong Xiao; Wei Lin; Jiangsu Du

arXiv:2508.12691·cs.GR·February 27, 2026

Adaptive Hybrid Caching for Efficient Text-to-Video Diffusion Model Acceleration

Yuanxin Wei, Lansong Diao, Bujiao Chen, Shenggan Cheng, Zhengping Qian, Wenyuan Yu, Nong Xiao, Wei Lin, Jiangsu Du

PDF

Open Access

TL;DR

This paper introduces MixCache, a flexible, training-free caching framework that adaptively optimizes caching granularity to accelerate video diffusion models without compromising quality.

Contribution

It presents a novel adaptive hybrid caching strategy that dynamically balances quality and speed in video diffusion model inference, surpassing existing single-granularity methods.

Findings

01

Achieves up to 1.97x speedup in video generation

02

Maintains high generation quality with improved efficiency

03

Demonstrates effectiveness across diverse models

Abstract

Efficient video generation models are increasingly vital for multimedia synthetic content generation. Leveraging the Transformer architecture and the diffusion process, video DiT models have emerged as a dominant approach for high-quality video generation. However, their multi-step iterative denoising process incurs high computational cost and inference latency. Caching, a widely adopted optimization method in DiT models, leverages the redundancy in the diffusion process to skip computations in different granularities (e.g., step, cfg, block). Nevertheless, existing caching methods are limited to single-granularity strategies, struggling to balance generation quality and inference speed in a flexible manner. In this work, we propose MixCache, a training-free caching-based framework for efficient video DiT inference. It first distinguishes the interference and boundary between different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Image and Signal Denoising Methods · Video Coding and Compression Technologies