Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion   Prior and Reward Feedback Learning

Weifeng Chen; Yatai Ji; Jie Wu; Hefeng Wu; Pan Xie; Jiashi Li; Xin; Xia; Xuefeng Xiao; Liang Lin

arXiv:2305.13840·cs.CV·August 13, 2024·28 cites

Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learning

Weifeng Chen, Yatai Ji, Jie Wu, Hefeng Wu, Pan Xie, Jiashi Li, Xin, Xia, Xuefeng Xiao, Liang Lin

PDF

Open Access 1 Repo 6 Models

TL;DR

Control-A-Video introduces a controllable text-to-video diffusion model that incorporates content and motion priors, along with reward feedback learning, to generate high-quality, motion-consistent videos guided by text prompts and control maps.

Contribution

The paper presents novel strategies for integrating content and motion priors into diffusion-based video generation and introduces a reward feedback learning algorithm for improved quality and consistency.

Findings

01

Produces higher-quality videos than existing methods.

02

Achieves better motion consistency and relevance.

03

Demonstrates effectiveness of reward feedback learning.

Abstract

Recent advances in text-to-image (T2I) diffusion models have enabled impressive image generation capabilities guided by text prompts. However, extending these techniques to video generation remains challenging, with existing text-to-video (T2V) methods often struggling to produce high-quality and motion-consistent videos. In this work, we introduce Control-A-Video, a controllable T2V diffusion model that can generate videos conditioned on text prompts and reference control maps like edge and depth maps. To tackle video quality and motion consistency issues, we propose novel strategies to incorporate content prior and motion prior into the diffusion-based generation process. Specifically, we employ a first-frame condition scheme to transfer video generation from the image domain. Additionally, we introduce residual-based and optical flow-based noise initialization to infuse motion priors…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weifeng-chen/control-a-video
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Human Motion and Animation

MethodsDiffusion