Deep multi-scale video prediction beyond mean square error
Michael Mathieu, Camille Couprie, Yann LeCun

TL;DR
This paper introduces a convolutional network for future video frame prediction that employs multi-scale architecture, adversarial training, and gradient difference loss to improve prediction quality beyond traditional MSE-based methods.
Contribution
It proposes a novel combination of multi-scale, adversarial, and gradient-based loss functions for improved video prediction accuracy.
Findings
Multi-scale architecture enhances prediction detail.
Adversarial training reduces blurriness in generated frames.
Gradient difference loss improves edge preservation.
Abstract
Learning to predict future images from a video sequence involves the construction of an internal representation that models the image evolution accurately, and therefore, to some degree, its content and dynamics. This is why pixel-space video prediction may be viewed as a promising avenue for unsupervised feature learning. In addition, while optical flow has been a very studied problem in computer vision for a long time, future frame prediction is rarely approached. Still, many vision applications could benefit from the knowledge of the next frames of videos, that does not require the complexity of tracking every pixel trajectories. In this work, we train a convolutional network to generate future frames given an input sequence. To deal with the inherently blurry predictions obtained from the standard Mean Squared Error (MSE) loss function, we propose three different and complementary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Advanced Vision and Imaging · Image and Signal Denoising Methods
