Video Representation Learning by Dense Predictive Coding

Tengda Han; Weidi Xie; Andrew Zisserman

arXiv:1909.04656·cs.CV·September 30, 2019·48 cites

Video Representation Learning by Dense Predictive Coding

Tengda Han, Weidi Xie, Andrew Zisserman

PDF

Open Access 1 Repo

TL;DR

This paper introduces Dense Predictive Coding (DPC), a self-supervised learning framework for video representations that predicts future spatio-temporal features, leading to state-of-the-art action recognition performance after fine-tuning.

Contribution

The paper presents a novel DPC framework, a curriculum training scheme for better temporal prediction, and demonstrates strong results on action recognition benchmarks.

Findings

01

DPC achieves state-of-the-art self-supervised results on UCF101 and HMDB51.

02

Pretraining with DPC approaches ImageNet pretraining performance.

03

The curriculum scheme improves the semantic quality of learned representations.

Abstract

The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition. We make three contributions: First, we introduce the Dense Predictive Coding (DPC) framework for self-supervised representation learning on videos. This learns a dense encoding of spatio-temporal blocks by recurrently predicting future representations; Second, we propose a curriculum training scheme to predict further into the future with progressively less temporal context. This encourages the model to only encode slowly varying spatial-temporal signals, therefore leading to semantic representations; Third, we evaluate the approach by first training the DPC model on the Kinetics-400 dataset with self-supervised learning, and then finetuning the representation on a downstream task, i.e. action recognition. With single stream (RGB only), DPC pretrained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TengdaHan/DPC
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Domain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications

MethodsAverage Pooling · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling