Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models

Yicheng Ji; Zhizhou Zhong; Jun Zhang; Qin Yang; XiTai Jin; Ying Qin; Wenhan Luo; Shuiyang Mao; Wei Liu; Huan Li

arXiv:2605.09681·cs.CV·May 12, 2026

Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models

Yicheng Ji, Zhizhou Zhong, Jun Zhang, Qin Yang, XiTai Jin, Ying Qin, Wenhan Luo, Shuiyang Mao, Wei Liu, Huan Li

PDF

2 Repos

TL;DR

This paper introduces Forcing-KV, a hybrid cache compression method for autoregressive video diffusion models that reduces memory usage and accelerates generation without sacrificing quality.

Contribution

It presents a novel head-wise functional analysis and a hybrid pruning strategy to optimize KV cache compression in AR video diffusion models.

Findings

01

Achieves over 29 fps generation speed on a single GPU.

02

Reduces cache memory by 30%.

03

Provides up to 2.82x speedup at higher resolutions.

Abstract

Autoregressive (AR) video diffusion models adopt a streaming generation framework, enabling long-horizon video generation with real-time responsiveness, as exemplified by the Self Forcing training paradigm. However, existing AR video diffusion models still suffer from significant attention complexity and severe memory overhead due to the redundant key-value (KV) caches across historical frames, which limits scalability. In this paper, we tackle this challenge by introducing KV cache compression into autoregressive video diffusion. We observe that attention heads in mainstream AR diffusion models exhibit markedly distinct attention patterns and functional roles that remain stable across samples and denoising steps. Building on our empirical study of head-wise functional specialization, we divide the attention heads into two categories: static heads, which focus on transitions across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.