Inter-intra Variant Dual Representations forSelf-supervised Video Recognition
Lin Zhang, Qi She, Zhengyang Shen, Changhu Wang

TL;DR
This paper introduces a dual representation learning approach for self-supervised video recognition that captures both intra- and inter-variance, leading to significant performance improvements across multiple benchmarks.
Contribution
It proposes a novel dual representation method that encodes intra-variance with a shuffle-rank task and inter-variance with a contrastive loss, enhancing self-supervised video learning.
Findings
Achieves 82.0% accuracy on UCF101 with SimCLR
Attains 51.2% accuracy on HMDB51
Reaches 46.1% video retrieval accuracy on UCF101
Abstract
Contrastive learning applied to self-supervised representation learning has seen a resurgence in deep models. In this paper, we find that existing contrastive learning based solutions for self-supervised video recognition focus on inter-variance encoding but ignore the intra-variance existing in clips within the same video. We thus propose to learn dual representations for each clip which (\romannumeral 1) encode intra-variance through a shuffle-rank pretext task; (\romannumeral 2) encode inter-variance through a temporal coherent contrastive loss. Experiment results show that our method plays an essential role in balancing inter and intra variances and brings consistent performance gains on multiple backbones and contrastive learning frameworks. Integrated with SimCLR and pretrained on Kinetics-400, our method achieves and downstream classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
MethodsBitcoin Customer Service Number +1-833-534-1729 · Contrastive Learning · 1x1 Convolution · Convolution · Batch Normalization · Residual Connection · Average Pooling · Global Average Pooling · Bottleneck Residual Block · Kaiming Initialization
