Self-supervised Co-training for Video Representation Learning

Tengda Han; Weidi Xie; Andrew Zisserman

arXiv:2010.09709·cs.CV·January 13, 2021·273 cites

Self-supervised Co-training for Video Representation Learning

Tengda Han, Weidi Xie, Andrew Zisserman

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a self-supervised co-training method for video representation learning that leverages multiple views to enhance contrastive learning, achieving state-of-the-art results efficiently on action recognition and video retrieval tasks.

Contribution

It proposes a novel co-training scheme that exploits complementary views like RGB and optical flow to improve contrastive learning in video representations.

Findings

01

Achieves state-of-the-art or comparable performance on downstream tasks.

02

Requires less training data for similar performance.

03

Enhances contrastive learning with multi-view co-training.

Abstract

The objective of this paper is visual-only self-supervised video representation learning. We make the following contributions: (i) we investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation (InfoNCE) training, showing that this form of supervised contrastive learning leads to a clear improvement in performance; (ii) we propose a novel self-supervised co-training scheme to improve the popular infoNCE loss, exploiting the complementary information from different views, RGB streams and optical flow, of the same data source by using one view to obtain positive class samples for the other; (iii) we thoroughly evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval. In both cases, the proposed approach demonstrates state-of-the-art or comparable performance with other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TengdaHan/CoCLR
pytorchOfficial

Videos

Self-supervised Co-Training for Video Representation Learning· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsINFO: An Efficient Optimization Algorithm based on Weighted Mean of Vectors · Contrastive Learning · InfoNCE