Efficient Autoregressive Video Diffusion with Dummy Head

Hang Guo; Zhaoyang Jia; Jiahao Li; Bin Li; Yuanhao Cai; Jiangshan Wang; Yawei Li; Yan Lu

arXiv:2601.20499·cs.CV·January 29, 2026

Efficient Autoregressive Video Diffusion with Dummy Head

Hang Guo, Zhaoyang Jia, Jiahao Li, Bin Li, Yuanhao Cai, Jiangshan Wang, Yawei Li, Yan Lu

PDF

Open Access

TL;DR

This paper introduces Dummy Forcing, a method to improve autoregressive video diffusion models by optimizing head-wise context usage, resulting in faster video generation with minimal quality loss.

Contribution

We propose Dummy Forcing, a novel technique that controls context access in multi-head self-attention, reducing redundancy and accelerating video diffusion without retraining.

Findings

01

Up to 2.0x speedup in video generation

02

Supports 24.3 FPS with less than 0.5% quality drop

03

Reduces context redundancy in attention heads

Abstract

The autoregressive video diffusion model has recently gained considerable research interest due to its causal modeling and iterative denoising. In this work, we identify that the multi-head self-attention in these models under-utilizes historical frames: approximately 25% heads attend almost exclusively to the current frame, and discarding their KV caches incurs only minor performance degradation. Building upon this, we propose Dummy Forcing, a simple yet effective method to control context accessibility across different heads. Specifically, the proposed heterogeneous memory allocation reduces head-wise context redundancy, accompanied by dynamic head programming to adaptively classify head types. Moreover, we develop a context packing technique to achieve more aggressive cache compression. Without additional training, our Dummy Forcing delivers up to 2.0x speedup over the baseline,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image and Video Quality Assessment · Video Coding and Compression Technologies