Self-Supervised Video Representation Learning with Meta-Contrastive Network
Yuanze Lin, Xun Guo, Yan Lu

TL;DR
This paper introduces a Meta-Contrastive Network (MCN) that combines contrastive and meta learning to improve self-supervised video representation learning, addressing the hard-positive problem and enhancing downstream task performance.
Contribution
The novel integration of contrastive learning with meta learning in MCN to improve generalization in self-supervised video representation learning.
Findings
MCN outperforms state-of-the-art methods on UCF101 and HMDB51 datasets.
Achieves 84.8% Top-1 accuracy in video action recognition with R(2+1)D backbone.
Improves video retrieval accuracy to 52.5% on UCF101.
Abstract
Self-supervised learning has been successfully applied to pre-train video representations, which aims at efficient adaptation from pre-training domain to downstream tasks. Existing approaches merely leverage contrastive loss to learn instance-level discrimination. However, lack of category information will lead to hard-positive problem that constrains the generalization ability of this kind of methods. We find that the multi-task process of meta learning can provide a solution to this problem. In this paper, we propose a Meta-Contrastive Network (MCN), which combines the contrastive learning and meta learning, to enhance the learning ability of existing self-supervised approaches. Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch. Extensive evaluations demonstrate the effectiveness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Contrastive Learning · Average Pooling · Global Average Pooling · Residual Connection · Dense Connections · (2+1)D Convolution · Batch Normalization · R(2+1)D
