Motion-Conditioned Diffusion Model for Controllable Video Synthesis
Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin,, Ming-Hsuan Yang

TL;DR
MCDiff is a novel diffusion-based model that enables controllable video synthesis from a starting image and sparse motion inputs, achieving state-of-the-art quality and diversity.
Contribution
We introduce MCDiff, a conditional diffusion model that combines flow completion and diffusion techniques for controllable, stroke-guided video synthesis.
Findings
Achieves state-of-the-art visual quality in stroke-guided video synthesis.
Effectively predicts dense motion from sparse inputs using flow completion.
Demonstrates versatility on diverse content and motion in MPII Human Pose experiments.
Abstract
Recent advancements in diffusion models have greatly improved the quality and diversity of synthesized content. To harness the expressive power of diffusion models, researchers have explored various controllable mechanisms that allow users to intuitively guide the content synthesis process. Although the latest efforts have primarily focused on video synthesis, there has been a lack of effective methods for controlling and describing desired content and motion. In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis. To tackle the ambiguity of sparse motion inputs and achieve better synthesis quality, MCDiff first utilizes a flow completion model to predict the dense video motion based on the semantic understanding of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
