Inter-intra Variant Dual Representations forSelf-supervised Video   Recognition

Lin Zhang; Qi She; Zhengyang Shen; Changhu Wang

arXiv:2107.01194·cs.CV·October 26, 2021·1 cites

Inter-intra Variant Dual Representations forSelf-supervised Video Recognition

Lin Zhang, Qi She, Zhengyang Shen, Changhu Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a dual representation learning approach for self-supervised video recognition that captures both intra- and inter-variance, leading to significant performance improvements across multiple benchmarks.

Contribution

It proposes a novel dual representation method that encodes intra-variance with a shuffle-rank task and inter-variance with a contrastive loss, enhancing self-supervised video learning.

Findings

01

Achieves 82.0% accuracy on UCF101 with SimCLR

02

Attains 51.2% accuracy on HMDB51

03

Reaches 46.1% video retrieval accuracy on UCF101

Abstract

Contrastive learning applied to self-supervised representation learning has seen a resurgence in deep models. In this paper, we find that existing contrastive learning based solutions for self-supervised video recognition focus on inter-variance encoding but ignore the intra-variance existing in clips within the same video. We thus propose to learn dual representations for each clip which (\romannumeral 1) encode intra-variance through a shuffle-rank pretext task; (\romannumeral 2) encode inter-variance through a temporal coherent contrastive loss. Experiment results show that our method plays an essential role in balancing inter and intra variances and brings consistent performance gains on multiple backbones and contrastive learning frameworks. Integrated with SimCLR and pretrained on Kinetics-400, our method achieves $82.0%$ and $51.2%$ downstream classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lzhangbj/DualVar
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsBitcoin Customer Service Number +1-833-534-1729 · Contrastive Learning · 1x1 Convolution · Convolution · Batch Normalization · Residual Connection · Average Pooling · Global Average Pooling · Bottleneck Residual Block · Kaiming Initialization