Self-Supervised Video Representation Learning with Meta-Contrastive   Network

Yuanze Lin; Xun Guo; Yan Lu

arXiv:2108.08426·cs.CV·August 24, 2021·1 cites

Self-Supervised Video Representation Learning with Meta-Contrastive Network

Yuanze Lin, Xun Guo, Yan Lu

PDF

Open Access

TL;DR

This paper introduces a Meta-Contrastive Network (MCN) that combines contrastive and meta learning to improve self-supervised video representation learning, addressing the hard-positive problem and enhancing downstream task performance.

Contribution

The novel integration of contrastive learning with meta learning in MCN to improve generalization in self-supervised video representation learning.

Findings

01

MCN outperforms state-of-the-art methods on UCF101 and HMDB51 datasets.

02

Achieves 84.8% Top-1 accuracy in video action recognition with R(2+1)D backbone.

03

Improves video retrieval accuracy to 52.5% on UCF101.

Abstract

Self-supervised learning has been successfully applied to pre-train video representations, which aims at efficient adaptation from pre-training domain to downstream tasks. Existing approaches merely leverage contrastive loss to learn instance-level discrimination. However, lack of category information will lead to hard-positive problem that constrains the generalization ability of this kind of methods. We find that the multi-task process of meta learning can provide a solution to this problem. In this paper, we propose a Meta-Contrastive Network (MCN), which combines the contrastive learning and meta learning, to enhance the learning ability of existing self-supervised approaches. Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch. Extensive evaluations demonstrate the effectiveness of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Contrastive Learning · Average Pooling · Global Average Pooling · Residual Connection · Dense Connections · (2+1)D Convolution · Batch Normalization · R(2+1)D