TL;DR
MoCoGAN introduces a novel unsupervised framework that decomposes motion and content in videos for improved generation, enabling control over video attributes and outperforming existing methods on multiple datasets.
Contribution
The paper presents a new GAN-based model that separates motion and content in video generation without supervision, enhancing control and diversity in generated videos.
Findings
Effective decomposition of motion and content demonstrated
Outperforms state-of-the-art on several datasets
Enables generation of videos with varied motion and content
Abstract
Visual signals in a video can be divided into content and motion. While content specifies which objects are in the video, motion describes their dynamics. Based on this prior, we propose the Motion and Content decomposed Generative Adversarial Network (MoCoGAN) framework for video generation. The proposed framework generates a video by mapping a sequence of random vectors to a sequence of video frames. Each random vector consists of a content part and a motion part. While the content part is kept fixed, the motion part is realized as a stochastic process. To learn motion and content decomposition in an unsupervised manner, we introduce a novel adversarial learning scheme utilizing both image and video discriminators. Extensive experimental results on several challenging datasets with qualitative and quantitative comparison to the state-of-the-art approaches, verify effectiveness of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
