Compositional Video Prediction
Yufei Ye, Maneesh Singh, Abhinav Gupta, Shubham Tulsiani

TL;DR
This paper introduces a compositional approach for pixel-level future video prediction that models scene entities and their interactions, enabling diverse and realistic stochastic video forecasts across different datasets.
Contribution
It proposes a novel method that predicts future states of scene entities separately, reasoning about their interactions, and uses a global latent variable to generate diverse plausible futures.
Findings
Outperforms alternative representations in multi-modality handling
Produces realistic stochastic predictions on object and human activity datasets
Enables diverse future video sampling
Abstract
We present an approach for pixel-level future prediction given an input image of a scene. We observe that a scene is comprised of distinct entities that undergo motion and present an approach that operationalizes this insight. We implicitly predict future states of independent entities while reasoning about their interactions, and compose future video frames using these predicted states. We overcome the inherent multi-modality of the task using a global trajectory-level latent random variable, and show that this allows us to sample diverse and plausible futures. We empirically validate our approach against alternate representations and ways of incorporating multi-modality. We examine two datasets, one comprising of stacked objects that may fall, and the other containing videos of humans performing activities in a gym, and show that our approach allows realistic stochastic video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
