Accelerating Diffusion-based Video Editing via Heterogeneous Caching: Beyond Full Computing at Sampled Denoising Timestep

Tianyi Liu; Ye Lu; Linfeng Zhang; Chen Cai; Jianjun Gao; Yi Wang; Kim-Hui Yap; Lap-Pui Chau

arXiv:2603.24260·cs.CV·March 26, 2026

Accelerating Diffusion-based Video Editing via Heterogeneous Caching: Beyond Full Computing at Sampled Denoising Timestep

Tianyi Liu, Ye Lu, Linfeng Zhang, Chen Cai, Jianjun Gao, Yi Wang, Kim-Hui Yap, Lap-Pui Chau

PDF

Open Access

TL;DR

This paper introduces HetCache, a novel caching framework that accelerates diffusion-based video editing by selectively reusing tokens based on their relevance, significantly reducing computation while preserving quality.

Contribution

HetCache exploits token heterogeneity in diffusion models, enabling efficient caching strategies that improve speed without sacrificing editing fidelity.

Findings

01

Achieves 2.67× latency speedup and FLOPs reduction

02

Maintains high editing quality with negligible degradation

03

Demonstrates effectiveness across multiple diffusion models

Abstract

Diffusion-based video editing has emerged as an important paradigm for high-quality and flexible content generation. However, despite their generality and strong modeling capacity, Diffusion Transformers (DiT) remain computationally expensive due to the iterative denoising process, posing challenges for practical deployment. Existing video diffusion acceleration methods primarily exploit denoising timestep-level feature reuse, which mitigates the redundancy in denoising process, but overlooks the architectural redundancy within the DiT that many attention operations over spatio-temporal tokens are redundantly executed, offering little to no incremental contribution to the model output. This work introduces HetCache, a training-free diffusion acceleration framework designed to exploit the inherent heterogeneity in diffusion-based masked video-to-video (MV2V) generation and editing.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Caching and Content Delivery