Compositional Video Prediction

Yufei Ye; Maneesh Singh; Abhinav Gupta; Shubham Tulsiani

arXiv:1908.08522·cs.CV·August 23, 2019

Compositional Video Prediction

Yufei Ye, Maneesh Singh, Abhinav Gupta, Shubham Tulsiani

PDF

Open Access 2 Repos

TL;DR

This paper introduces a compositional approach for pixel-level future video prediction that models scene entities and their interactions, enabling diverse and realistic stochastic video forecasts across different datasets.

Contribution

It proposes a novel method that predicts future states of scene entities separately, reasoning about their interactions, and uses a global latent variable to generate diverse plausible futures.

Findings

01

Outperforms alternative representations in multi-modality handling

02

Produces realistic stochastic predictions on object and human activity datasets

03

Enables diverse future video sampling

Abstract

We present an approach for pixel-level future prediction given an input image of a scene. We observe that a scene is comprised of distinct entities that undergo motion and present an approach that operationalizes this insight. We implicitly predict future states of independent entities while reasoning about their interactions, and compose future video frames using these predicted states. We overcome the inherent multi-modality of the task using a global trajectory-level latent random variable, and show that this allows us to sample diverse and plausible futures. We empirically validate our approach against alternate representations and ways of incorporating multi-modality. We examine two datasets, one comprising of stacked objects that may fall, and the other containing videos of humans performing activities in a gym, and show that our approach allows realistic stochastic video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications