Video (language) modeling: a baseline for generative models of natural   videos

MarcAurelio Ranzato; Arthur Szlam; Joan Bruna; Michael Mathieu; Ronan; Collobert; Sumit Chopra

arXiv:1412.6604·cs.LG·May 5, 2016·302 cites

Video (language) modeling: a baseline for generative models of natural videos

MarcAurelio Ranzato, Arthur Szlam, Joan Bruna, Michael Mathieu, Ronan, Collobert, Sumit Chopra

PDF

Open Access 1 Repo

TL;DR

This paper introduces a baseline for unsupervised video feature learning by adapting language modeling techniques to predict missing or future frames, capturing complex motion and deformation patterns.

Contribution

It adapts language modeling methods to the vision domain for video prediction, demonstrating effective motion prediction on natural videos.

Findings

01

Model predicts non-trivial motions in short video sequences

02

Effective learning of spatial and temporal correlations

03

Applicable to both filling and generation tasks

Abstract

We propose a strong baseline model for unsupervised feature learning using video data. By learning to predict missing frames or extrapolate future frames from an input video sequence, the model discovers both spatial and temporal correlations which are useful to represent complex deformations and motion patterns. The models we propose are largely borrowed from the language modeling literature, and adapted to the vision domain by quantizing the space of image patches into a large dictionary. We demonstrate the approach on both a filling and a generation task. For the first time, we show that, after training on natural videos, such a model can predict non-trivial motions over short video sequences.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amritanjali123/NM373_Future_Predicators
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Advanced Vision and Imaging