Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks
Tianfan Xue, Jiajun Wu, Katherine L. Bouman, William T. Freeman

TL;DR
This paper introduces a probabilistic approach for future frame synthesis from a single image using a novel Cross Convolutional Network, enabling diverse and realistic motion predictions for both synthetic and real-world data.
Contribution
It presents a new probabilistic model and a Cross Convolutional Network architecture for generating multiple plausible future frames from a single image.
Findings
Model performs well on synthetic and real-world data
Network learns compact encoding of object appearance and motion
Applications include visual analogy-making and video extrapolation
Abstract
We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods that have tackled this problem in a deterministic or non-parametric way, we propose to model future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. To synthesize realistic movement of objects, we propose a novel network structure, namely a Cross Convolutional Network; this network encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, and on real-world video frames. We present analyses of the learned network representations, showing it is implicitly learning a compact encoding of object appearance and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
