Motion-Conditioned Diffusion Model for Controllable Video Synthesis

Tsai-Shien Chen; Chieh Hubert Lin; Hung-Yu Tseng; Tsung-Yi Lin,; Ming-Hsuan Yang

arXiv:2304.14404·cs.CV·April 28, 2023·6 cites

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin,, Ming-Hsuan Yang

PDF

Open Access

TL;DR

MCDiff is a novel diffusion-based model that enables controllable video synthesis from a starting image and sparse motion inputs, achieving state-of-the-art quality and diversity.

Contribution

We introduce MCDiff, a conditional diffusion model that combines flow completion and diffusion techniques for controllable, stroke-guided video synthesis.

Findings

01

Achieves state-of-the-art visual quality in stroke-guided video synthesis.

02

Effectively predicts dense motion from sparse inputs using flow completion.

03

Demonstrates versatility on diverse content and motion in MPII Human Pose experiments.

Abstract

Recent advancements in diffusion models have greatly improved the quality and diversity of synthesized content. To harness the expressive power of diffusion models, researchers have explored various controllable mechanisms that allow users to intuitively guide the content synthesis process. Although the latest efforts have primarily focused on video synthesis, there has been a lack of effective methods for controlling and describing desired content and motion. In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis. To tackle the ambiguity of sparse motion inputs and achieve better synthesis quality, MCDiff first utilizes a flow completion model to predict the dense video motion based on the semantic understanding of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion