ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models
Kaifeng Gao, Jiaxin Shi, Hanwang Zhang, Chunping Wang, Jun Xiao

TL;DR
ViD-GPT introduces a causal, autoregressive approach with GPT-style generation and kv-cache acceleration to improve long video generation quality and efficiency in diffusion models.
Contribution
It pioneers the integration of GPT-style causal attention and prompt-based conditioning into video diffusion models for long video synthesis.
Findings
Achieves state-of-the-art results in long video generation.
Significantly improves inference speed with kv-cache mechanism.
Demonstrates superior qualitative and quantitative performance.
Abstract
With the advance of diffusion models, today's video generation has achieved impressive quality. But generating temporal consistent long videos is still challenging. A majority of video diffusion models (VDMs) generate long videos in an autoregressive manner, i.e., generating subsequent clips conditioned on last frames of previous clip. However, existing approaches all involve bidirectional computations, which restricts the receptive context of each autoregression step, and results in the model lacking long-term dependencies. Inspired from the huge success of large language models (LLMs) and following GPT (generative pre-trained transformer), we bring causal (i.e., unidirectional) generation into VDMs, and use past frames as prompt to generate future frames. For Causal Generation, we introduce causal temporal attention into VDM, which forces each generated frame to depend on its previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
MethodsAttention Is All You Need · Cosine Annealing · Byte Pair Encoding · Attention Dropout · Weight Decay · Dropout · Adam · Linear Warmup With Cosine Annealing · Linear Layer · Dense Connections
