Unsupervised Learning of Video Representations using LSTMs

Nitish Srivastava; Elman Mansimov; Ruslan Salakhutdinov

arXiv:1502.04681·cs.LG·January 5, 2016·1.7k cites

Unsupervised Learning of Video Representations using LSTMs

Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov

PDF

Open Access 5 Repos

TL;DR

This paper introduces an unsupervised approach using LSTM networks to learn video representations, which can be used for various tasks including sequence reconstruction, future prediction, and improving action recognition, especially with limited labeled data.

Contribution

The paper presents a novel unsupervised learning framework with LSTMs for video representation that enhances downstream tasks like action recognition, even with out-of-domain pretraining.

Findings

01

LSTM-based representations improve action recognition accuracy.

02

Pretraining on large unlabeled video data benefits supervised tasks.

03

The model can extrapolate and visualize learned video features.

Abstract

We use multilayer Long Short Term Memory (LSTM) networks to learn representations of video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed length representation. This representation is decoded using single or multiple decoder LSTMs to perform different tasks, such as reconstructing the input sequence, or predicting the future sequence. We experiment with two kinds of input sequences - patches of image pixels and high-level representations ("percepts") of video frames extracted using a pretrained convolutional net. We explore different design choices such as whether the decoder LSTMs should condition on the generated output. We analyze the outputs of the model qualitatively to see how well the model can extrapolate the learned video representation into the future and into the past. We try to visualize and interpret the learned features. We stress test the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory