FitVid: Overfitting in Pixel-Level Video Prediction

Mohammad Babaeizadeh; Mohammad Taghi Saffar; Suraj Nair; Sergey; Levine; Chelsea Finn; Dumitru Erhan

arXiv:2106.13195·cs.CV·June 25, 2021·29 cites

FitVid: Overfitting in Pixel-Level Video Prediction

Mohammad Babaeizadeh, Mohammad Taghi Saffar, Suraj Nair, Sergey, Levine, Chelsea Finn, Dumitru Erhan

PDF

Open Access 1 Repo

TL;DR

This paper introduces FitVid, a new video prediction architecture that overfits on benchmarks to achieve high-quality predictions, demonstrating that overfitting can be beneficial and mitigated with augmentation techniques.

Contribution

We propose FitVid, a novel architecture capable of severe overfitting with similar parameters to existing models, improving performance across multiple benchmarks.

Findings

01

FitVid achieves state-of-the-art results on four benchmarks.

02

Overfitting can produce high-quality predictions by repeating training data.

03

Data augmentation techniques help mitigate overfitting effects.

Abstract

An agent that is capable of predicting what happens next can perform a variety of tasks through planning with no additional training. Furthermore, such an agent can internally represent the complex dynamics of the real-world and therefore can acquire a representation useful for a variety of visual perception tasks. This makes predicting the future frames of a video, conditioned on the observed past and potentially future actions, an interesting task which remains exceptionally challenging despite many recent advances. Existing video prediction models have shown promising results on simple narrow benchmarks but they generate low quality predictions on real-life datasets with more complicated dynamics or broader domain. There is a growing body of evidence that underfitting on the training data is one of the primary causes for the low quality predictions. In this paper, we argue that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/fitvid
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging