Past- and Future-Informed KV Cache Policy with Salience Estimation in Autoregressive Video Diffusion

Hanmo Chen; Chenghao Xu; Xu Yang; Xuan Chen; Cheng Deng

arXiv:2601.21896·cs.CV·February 5, 2026

Past- and Future-Informed KV Cache Policy with Salience Estimation in Autoregressive Video Diffusion

Hanmo Chen, Chenghao Xu, Xu Yang, Xuan Chen, Cheng Deng

PDF

Open Access

TL;DR

This paper introduces PaFu-KV, a salience-aware cache policy for autoregressive video generation that improves quality and efficiency by selectively retaining important tokens based on estimated salience scores.

Contribution

The paper proposes a novel Past- and Future-Informed KV Cache Policy with a salience estimation head, enhancing token retention decisions in long-term video generation.

Findings

01

Improves video quality by retaining critical tokens.

02

Reduces memory footprint and accelerates inference.

03

Maintains high-fidelity video generation on benchmarks.

Abstract

Video generation is pivotal to digital media creation, and recent advances in autoregressive video generation have markedly enhanced the efficiency of real-time video synthesis. However, existing approaches generally rely on heuristic KV Cache policies, which ignore differences in token importance in long-term video generation. This leads to the loss of critical spatiotemporal information and the accumulation of redundant, invalid cache, thereby degrading video generation quality and efficiency. To address this limitation, we first observe that token contributions to video generation are highly time-heterogeneous and accordingly propose a novel Past- and Future-Informed KV Cache Policy (PaFu-KV). Specifically, PaFu-KV introduces a lightweight Salience Estimation Head distilled from a bidirectional teacher to estimate salience scores, allowing the KV cache to retain informative tokens…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Generative Adversarial Networks and Image Synthesis · Image and Video Quality Assessment