VideoComposer: Compositional Video Synthesis with Motion Controllability
Xiang Wang, Hangjie Yuan, Shiwei Zhang, Dayou Chen, Jiuniu Wang,, Yingya Zhang, Yujun Shen, Deli Zhao, Jingren Zhou

TL;DR
VideoComposer introduces a novel method for controllable video synthesis that integrates textual, spatial, and temporal conditions, including motion vectors, to generate videos with high inter-frame consistency and flexible content control.
Contribution
It presents a new compositional framework for video synthesis that effectively incorporates motion vectors and a spatio-temporal encoder for enhanced controllability and consistency.
Findings
Achieves high inter-frame consistency in synthesized videos.
Enables control via multiple input modalities like text, sketches, and motion.
Demonstrates flexible and precise video content creation.
Abstract
The pursuit of controllability as a higher standard of visual content creation has yielded remarkable progress in customizable image synthesis. However, achieving controllable video synthesis remains challenging due to the large variation of temporal dynamics and the requirement of cross-frame temporal consistency. Based on the paradigm of compositional generation, this work presents VideoComposer that allows users to flexibly compose a video with textual conditions, spatial conditions, and more importantly temporal conditions. Specifically, considering the characteristic of video data, we introduce the motion vector from compressed videos as an explicit control signal to provide guidance regarding temporal dynamics. In addition, we develop a Spatio-Temporal Condition encoder (STC-encoder) that serves as a unified interface to effectively incorporate the spatial and temporal relations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
