Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

Haocheng Xi; Shuo Yang; Yilong Zhao; Muyang Li; Han Cai; Xingyang Li; Yujun Lin; Zhuoyang Zhang; Jintao Zhang; Xiuyu Li; Zhiying Xu; Jun Wu; Chenfeng Xu; Ion Stoica; Song Han; Kurt Keutzer

arXiv:2602.02958·cs.LG·May 7, 2026

Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

Haocheng Xi, Shuo Yang, Yilong Zhao, Muyang Li, Han Cai, Xingyang Li, Yujun Lin, Zhuoyang Zhang, Jintao Zhang, Xiuyu Li, Zhiying Xu, Jun Wu, Chenfeng Xu, Ion Stoica, Song Han, Kurt Keutzer

PDF

1 Repo

TL;DR

Quant VideoGen introduces a novel KV cache quantization framework that significantly reduces memory usage in autoregressive video generation, enabling longer, high-quality video synthesis on standard hardware.

Contribution

The paper proposes a training-free, multi-stage residual quantization method leveraging video redundancy, improving memory efficiency and generation quality in autoregressive video diffusion models.

Findings

01

Reduces KV cache memory by up to 7.0 times

02

Achieves less than 4% latency overhead

03

Outperforms existing baselines in quality on multiple benchmarks

Abstract

Despite rapid progress in autoregressive video diffusion, an emerging system algorithm bottleneck limits both deployability and generation capability: KV cache memory. In autoregressive video generation models, the KV cache grows with generation history and quickly dominates GPU memory, often exceeding 30 GB, preventing deployment on widely available hardware. More critically, constrained KV cache budgets restrict the effective working memory, directly degrading long horizon consistency in identity, layout, and motion. To address this challenge, we present Quant VideoGen (QVG), a training free KV cache quantization framework for autoregressive video diffusion models. QVG leverages video spatiotemporal redundancy through Semantic Aware Smoothing, producing low magnitude, quantization friendly residuals. It further introduces Progressive Residual Quantization, a coarse to fine multi stage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

svg-project/Quant-VideoGen
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.