Cascaded Video Generation for Videos In-the-Wild
Lluis Castrejon, Nicolas Ballas, Aaron Courville

TL;DR
This paper introduces a cascaded, coarse-to-fine video generation model that produces high-resolution, multi-frame videos efficiently, validated on multiple datasets and scalable to large resolutions.
Contribution
The paper presents a novel cascaded approach for high-resolution video generation that reduces computational complexity and improves scalability compared to existing methods.
Findings
Competitive performance on UCF101 and Kinetics-600 datasets.
Successfully trained a three-level model generating 256x256 videos with 48 frames.
Demonstrated scalability to high-resolution, multi-frame videos.
Abstract
Videos can be created by first outlining a global view of the scene and then adding local details. Inspired by this idea we propose a cascaded model for video generation which follows a coarse to fine approach. First our model generates a low resolution video, establishing the global scene structure, which is then refined by subsequent cascade levels operating at larger resolutions. We train each cascade level sequentially on partial views of the videos, which reduces the computational complexity of our model and makes it scalable to high-resolution videos with many frames. We empirically validate our approach on UCF101 and Kinetics-600, for which our model is competitive with the state-of-the-art. We further demonstrate the scaling capabilities of our model and train a three-level model on the BDD100K dataset which generates 256x256 pixels videos with 48 frames.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Advanced Image Processing Techniques
