Transformation-Based Models of Video Sequences
Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur, Szlam, Du Tran, Soumith Chintala

TL;DR
This paper introduces an unsupervised transformation-based model for next video frame prediction that predicts transformation parameters instead of raw pixels, resulting in sharper frames and improved efficiency.
Contribution
It proposes a novel transformation-based prediction method and a new evaluation protocol to fairly compare video prediction models.
Findings
Outperforms existing methods on UCF-101 dataset
Produces sharper and more realistic frames
Requires fewer parameters and less computation
Abstract
In this work we propose a simple unsupervised approach for next frame prediction in video. Instead of directly predicting the pixels in a frame given past frames, we predict the transformations needed for generating the next frame in a sequence, given the transformations of the past frames. This leads to sharper results, while using a smaller prediction model. In order to enable a fair comparison between different video frame prediction models, we also propose a new evaluation protocol. We use generated frames as input to a classifier trained with ground truth sequences. This criterion guarantees that models scoring high are those producing sequences which preserve discriminative features, as opposed to merely penalizing any deviation, plausible or not, from the ground truth. Our proposed approach compares favourably against more sophisticated ones on the UCF-101 data set, while also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Advanced Vision and Imaging · Image and Video Quality Assessment
