WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction
Guillaume Le Moing, Jean Ponce, Cordelia Schmid

TL;DR
WALDO introduces a layered object decomposition and parametric flow prediction method for future video synthesis, outperforming existing techniques across various benchmarks by modeling complex scene motions effectively.
Contribution
The paper proposes a novel layered object decomposition approach combined with parametric flow prediction for improved future video frame synthesis.
Findings
Outperforms state-of-the-art methods on multiple benchmarks.
Effectively models complex scene motions including nonrigid movements.
Demonstrates significant improvements in video prediction accuracy.
Abstract
This paper presents WALDO (WArping Layer-Decomposed Objects), a novel approach to the prediction of future video frames from past ones. Individual images are decomposed into multiple layers combining object masks and a small set of control points. The layer structure is shared across all frames in each video to build dense inter-frame connections. Complex scene motions are modeled by combining parametric geometric transformations associated with individual layers, and video synthesis is broken down into discovering the layers associated with past frames, predicting the corresponding transformations for upcoming ones and warping the associated object regions accordingly, and filling in the remaining image parts. Extensive experiments on multiple benchmarks including urban videos (Cityscapes and KITTI) and videos featuring nonrigid motions (UCF-Sports and H3.6M), show that our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Human Pose and Action Recognition
