FitVid: Overfitting in Pixel-Level Video Prediction
Mohammad Babaeizadeh, Mohammad Taghi Saffar, Suraj Nair, Sergey, Levine, Chelsea Finn, Dumitru Erhan

TL;DR
This paper introduces FitVid, a new video prediction architecture that overfits on benchmarks to achieve high-quality predictions, demonstrating that overfitting can be beneficial and mitigated with augmentation techniques.
Contribution
We propose FitVid, a novel architecture capable of severe overfitting with similar parameters to existing models, improving performance across multiple benchmarks.
Findings
FitVid achieves state-of-the-art results on four benchmarks.
Overfitting can produce high-quality predictions by repeating training data.
Data augmentation techniques help mitigate overfitting effects.
Abstract
An agent that is capable of predicting what happens next can perform a variety of tasks through planning with no additional training. Furthermore, such an agent can internally represent the complex dynamics of the real-world and therefore can acquire a representation useful for a variety of visual perception tasks. This makes predicting the future frames of a video, conditioned on the observed past and potentially future actions, an interesting task which remains exceptionally challenging despite many recent advances. Existing video prediction models have shown promising results on simple narrow benchmarks but they generate low quality predictions on real-life datasets with more complicated dynamics or broader domain. There is a growing body of evidence that underfitting on the training data is one of the primary causes for the low quality predictions. In this paper, we argue that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
