Past- and Future-Informed KV Cache Policy with Salience Estimation in Autoregressive Video Diffusion
Hanmo Chen, Chenghao Xu, Xu Yang, Xuan Chen, Cheng Deng

TL;DR
This paper introduces PaFu-KV, a salience-aware cache policy for autoregressive video generation that improves quality and efficiency by selectively retaining important tokens based on estimated salience scores.
Contribution
The paper proposes a novel Past- and Future-Informed KV Cache Policy with a salience estimation head, enhancing token retention decisions in long-term video generation.
Findings
Improves video quality by retaining critical tokens.
Reduces memory footprint and accelerates inference.
Maintains high-fidelity video generation on benchmarks.
Abstract
Video generation is pivotal to digital media creation, and recent advances in autoregressive video generation have markedly enhanced the efficiency of real-time video synthesis. However, existing approaches generally rely on heuristic KV Cache policies, which ignore differences in token importance in long-term video generation. This leads to the loss of critical spatiotemporal information and the accumulation of redundant, invalid cache, thereby degrading video generation quality and efficiency. To address this limitation, we first observe that token contributions to video generation are highly time-heterogeneous and accordingly propose a novel Past- and Future-Informed KV Cache Policy (PaFu-KV). Specifically, PaFu-KV introduces a lightweight Salience Estimation Head distilled from a bidirectional teacher to estimate salience scores, allowing the KV cache to retain informative tokens…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Generative Adversarial Networks and Image Synthesis · Image and Video Quality Assessment
