Motion Sensitive Contrastive Learning for Self-supervised Video   Representation

Jingcheng Ni; Nan Zhou; Jie Qin; Qian Wu; Junqi Liu; Boxun Li; Di; Huang

arXiv:2208.06105·cs.CV·August 15, 2022·1 cites

Motion Sensitive Contrastive Learning for Self-supervised Video Representation

Jingcheng Ni, Nan Zhou, Jie Qin, Qian Wu, Junqi Liu, Boxun Li, Di, Huang

PDF

Open Access

TL;DR

This paper introduces Motion Sensitive Contrastive Learning (MSCL), a novel approach that enhances self-supervised video representation by integrating motion information from optical flows with RGB frames, improving performance on multiple benchmarks.

Contribution

The paper proposes MSCL with local motion contrastive learning, flow rotation augmentation, and motion differential sampling, advancing contrastive learning for video understanding by explicitly modeling motion dynamics.

Findings

01

Achieves 91.5% top-1 accuracy on UCF101

02

Attains 50.3% top-1 accuracy on Something-Something v2

03

Reaches 65.6% top-1 recall on UCF101 for retrieval

Abstract

Contrastive learning has shown great potential in video representation learning. However, existing approaches fail to sufficiently exploit short-term motion dynamics, which are crucial to various down-stream video understanding tasks. In this paper, we propose Motion Sensitive Contrastive Learning (MSCL) that injects the motion information captured by optical flows into RGB frames to strengthen feature learning. To achieve this, in addition to clip-level global contrastive learning, we develop Local Motion Contrastive Learning (LMCL) with frame-level contrastive objectives across the two modalities. Moreover, we introduce Flow Rotation Augmentation (FRA) to generate extra motion-shuffled negative samples and Motion Differential Sampling (MDS) to accurately screen training samples. Extensive experiments on standard benchmarks validate the effectiveness of the proposed method. With the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning