Memory-augmented Dense Predictive Coding for Video Representation   Learning

Tengda Han; Weidi Xie; Andrew Zisserman

arXiv:2008.01065·cs.CV·August 4, 2020·81 cites

Memory-augmented Dense Predictive Coding for Video Representation Learning

Tengda Han, Weidi Xie, Andrew Zisserman

PDF

Open Access 1 Repo

TL;DR

This paper introduces MemDPC, a novel self-supervised learning framework for video representations that leverages memory-augmented predictive coding to improve action recognition and related tasks with less data.

Contribution

The paper proposes MemDPC, a new architecture with a predictive attention mechanism over compressed memories for efficient, hypothesis-generating video representation learning.

Findings

01

Achieves state-of-the-art performance on multiple downstream tasks.

02

Requires significantly less training data than previous methods.

03

Demonstrates effectiveness across RGB and optical flow inputs.

Abstract

The objective of this paper is self-supervised learning from video, in particular for representations for action recognition. We make the following contributions: (i) We propose a new architecture and learning framework Memory-augmented Dense Predictive Coding (MemDPC) for the task. It is trained with a predictive attention mechanism over the set of compressed memories, such that any future states can always be constructed by a convex combination of the condense representations, allowing to make multiple hypotheses efficiently. (ii) We investigate visual-only self-supervised video representation learning from RGB frames, or from unsupervised optical flow, or both. (iii) We thoroughly evaluate the quality of learnt representation on four different downstream tasks: action recognition, video retrieval, learning with scarce annotations, and unintentional action classification. In all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TengdaHan/MemDPC
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning