AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
Zhen Xing, Qi Dai, Zejia Weng, Zuxuan Wu, Yu-Gang Jiang

TL;DR
This paper introduces AID, a novel method that adapts Image2Video diffusion models with instruction-guided control for improved video prediction, achieving state-of-the-art results across multiple datasets.
Contribution
The paper proposes a new framework combining a dual query transformer and adapters to transfer pretrained Image2Video models for instruction-guided video prediction with minimal training.
Findings
Significant improvements in FVD scores on multiple datasets.
Outperforms existing state-of-the-art methods in instruction-guided video prediction.
Demonstrates effective transfer of video dynamic priors with minimal training.
Abstract
Text-guided video prediction (TVP) involves predicting the motion of future frames from the initial frame according to an instruction, which has wide applications in virtual reality, robotics, and content creation. Previous TVP methods make significant breakthroughs by adapting Stable Diffusion for this task. However, they struggle with frame consistency and temporal stability primarily due to the limited scale of video datasets. We observe that pretrained Image2Video diffusion models possess good priors for video dynamics but they lack textual control. Hence, transferring Image2Video models to leverage their video dynamic priors while injecting instruction control to generate controllable videos is both a meaningful and challenging task. To achieve this, we introduce the Multi-Modal Large Language Model (MLLM) to predict future video states based on initial frames and text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Video Analysis and Summarization · Online Learning and Analytics
MethodsDiffusion
