Hierarchical Video Generation for Complex Data
Lluis Castrejon, Nicolas Ballas, Aaron Courville

TL;DR
This paper introduces a hierarchical, coarse-to-fine video generation model that efficiently creates high-resolution, multi-frame videos by sequentially training on partial views, scaling to complex datasets.
Contribution
A novel hierarchical model for video generation that reduces computational complexity and enables high-resolution, multi-frame video synthesis.
Findings
Successfully generates 256x256 videos with 48 frames
Scales to high-resolution videos beyond a few frames
Validated on Kinetics-600 and BDD100K datasets
Abstract
Videos can often be created by first outlining a global description of the scene and then adding local details. Inspired by this we propose a hierarchical model for video generation which follows a coarse to fine approach. First our model generates a low resolution video, establishing the global scene structure, that is then refined by subsequent levels in the hierarchy. We train each level in our hierarchy sequentially on partial views of the videos. This reduces the computational complexity of our generative model, which scales to high-resolution videos beyond a few frames. We validate our approach on Kinetics-600 and BDD100K, for which we train a three level model capable of generating 256x256 videos with 48 frames.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Computer Graphics and Visualization Techniques
