Unsupervised Learning of Video Representations using LSTMs
Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov

TL;DR
This paper introduces an unsupervised approach using LSTM networks to learn video representations, which can be used for various tasks including sequence reconstruction, future prediction, and improving action recognition, especially with limited labeled data.
Contribution
The paper presents a novel unsupervised learning framework with LSTMs for video representation that enhances downstream tasks like action recognition, even with out-of-domain pretraining.
Findings
LSTM-based representations improve action recognition accuracy.
Pretraining on large unlabeled video data benefits supervised tasks.
The model can extrapolate and visualize learned video features.
Abstract
We use multilayer Long Short Term Memory (LSTM) networks to learn representations of video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed length representation. This representation is decoded using single or multiple decoder LSTMs to perform different tasks, such as reconstructing the input sequence, or predicting the future sequence. We experiment with two kinds of input sequences - patches of image pixels and high-level representations ("percepts") of video frames extracted using a pretrained convolutional net. We explore different design choices such as whether the decoder LSTMs should condition on the generated output. We analyze the outputs of the model qualitatively to see how well the model can extrapolate the learned video representation into the future and into the past. We try to visualize and interpret the learned features. We stress test the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
