WALDO: Future Video Synthesis using Object Layer Decomposition and   Parametric Flow Prediction

Guillaume Le Moing; Jean Ponce; Cordelia Schmid

arXiv:2211.14308·cs.CV·August 30, 2023

WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction

Guillaume Le Moing, Jean Ponce, Cordelia Schmid

PDF

Open Access 1 Repo

TL;DR

WALDO introduces a layered object decomposition and parametric flow prediction method for future video synthesis, outperforming existing techniques across various benchmarks by modeling complex scene motions effectively.

Contribution

The paper proposes a novel layered object decomposition approach combined with parametric flow prediction for improved future video frame synthesis.

Findings

01

Outperforms state-of-the-art methods on multiple benchmarks.

02

Effectively models complex scene motions including nonrigid movements.

03

Demonstrates significant improvements in video prediction accuracy.

Abstract

This paper presents WALDO (WArping Layer-Decomposed Objects), a novel approach to the prediction of future video frames from past ones. Individual images are decomposed into multiple layers combining object masks and a small set of control points. The layer structure is shared across all frames in each video to build dense inter-frame connections. Complex scene motions are modeled by combining parametric geometric transformations associated with individual layers, and video synthesis is broken down into discovering the layers associated with past frames, predicting the corresponding transformations for upcoming ones and warping the associated object regions accordingly, and filling in the remaining image parts. Extensive experiments on multiple benchmarks including urban videos (Cityscapes and KITTI) and videos featuring nonrigid motions (UCF-Sports and H3.6M), show that our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

16lemoing/waldo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Human Pose and Action Recognition