PackCache: A Training-Free Acceleration Method for Unified Autoregressive Video Generation via Compact KV-Cache
Kunyang Li, Mubarak Shah, Yuzhang Shang

TL;DR
PackCache is a training-free method that dynamically manages KV-cache in autoregressive video generation, significantly improving efficiency and enabling longer sequences by leveraging spatiotemporal properties of cache tokens.
Contribution
It introduces a novel KV-cache management technique that compacts cache tokens based on semantic anchors and temporal decay, enhancing inference speed without retraining.
Findings
Accelerates 48-frame video generation by 1.7-2.2x
Achieves 2.6x to 3.7x speedup on final frames
Enables longer-sequence video generation with reduced memory usage
Abstract
A unified autoregressive model is a Transformer-based framework that addresses diverse multimodal tasks (e.g., text, image, video) as a single sequence modeling problem under a shared token space. Such models rely on the KV-cache mechanism to reduce attention computation from O(T^2) to O(T); however, KV-cache size grows linearly with the number of generated tokens, and it rapidly becomes the dominant bottleneck limiting inference efficiency and generative length. Unified autoregressive video generation inherits this limitation. Our analysis reveals that KV-cache tokens exhibit distinct spatiotemporal properties: (i) text and conditioning-image tokens act as persistent semantic anchors that consistently receive high attention, and (ii) attention to previous frames naturally decays with temporal distance. Leveraging these observations, we introduce PackCache, a training-free KV-cache…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Computer Graphics and Visualization Techniques
