Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learning
Weifeng Chen, Yatai Ji, Jie Wu, Hefeng Wu, Pan Xie, Jiashi Li, Xin, Xia, Xuefeng Xiao, Liang Lin

TL;DR
Control-A-Video introduces a controllable text-to-video diffusion model that incorporates content and motion priors, along with reward feedback learning, to generate high-quality, motion-consistent videos guided by text prompts and control maps.
Contribution
The paper presents novel strategies for integrating content and motion priors into diffusion-based video generation and introduces a reward feedback learning algorithm for improved quality and consistency.
Findings
Produces higher-quality videos than existing methods.
Achieves better motion consistency and relevance.
Demonstrates effectiveness of reward feedback learning.
Abstract
Recent advances in text-to-image (T2I) diffusion models have enabled impressive image generation capabilities guided by text prompts. However, extending these techniques to video generation remains challenging, with existing text-to-video (T2V) methods often struggling to produce high-quality and motion-consistent videos. In this work, we introduce Control-A-Video, a controllable T2V diffusion model that can generate videos conditioned on text prompts and reference control maps like edge and depth maps. To tackle video quality and motion consistency issues, we propose novel strategies to incorporate content prior and motion prior into the diffusion-based generation process. Specifically, we employ a first-frame condition scheme to transfer video generation from the image domain. Additionally, we introduce residual-based and optical flow-based noise initialization to infuse motion priors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Human Motion and Animation
MethodsDiffusion
