MoQuad: Motion-focused Quadruple Construction for Video Contrastive   Learning

Yuan Liu; Jiacheng Chen; Hao Wu

arXiv:2212.10870·cs.CV·December 22, 2022·1 cites

MoQuad: Motion-focused Quadruple Construction for Video Contrastive Learning

Yuan Liu, Jiacheng Chen, Hao Wu

PDF

Open Access

TL;DR

This paper introduces MoQuad, a simple yet effective contrastive learning strategy that enhances motion feature learning in videos by disturbing appearance and motion in positive and negative samples, leading to superior downstream task performance.

Contribution

MoQuad is a novel quadruple construction method that improves motion feature learning without extra auxiliary tasks or explicit temporal modeling.

Findings

01

Achieves 93.7% accuracy on UCF-101 after 200 epochs pre-training.

02

Outperforms state-of-the-art methods on downstream video tasks.

03

Maintains a simple contrastive learning framework without multi-task learning.

Abstract

Learning effective motion features is an essential pursuit of video representation learning. This paper presents a simple yet effective sample construction strategy to boost the learning of motion features in video contrastive learning. The proposed method, dubbed Motion-focused Quadruple Construction (MoQuad), augments the instance discrimination by meticulously disturbing the appearance and motion of both the positive and negative samples to create a quadruple for each video instance, such that the model is encouraged to exploit motion information. Unlike recent approaches that create extra auxiliary tasks for learning motion features or apply explicit temporal modelling, our method keeps the simple and clean contrastive learning paradigm (i.e.,SimCLR) without multi-task learning or extra modelling. In addition, we design two extra training strategies by analyzing initial MoQuad…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Vision and Imaging

MethodsBitcoin Customer Service Number +1-833-534-1729 · Average Pooling · Batch Normalization · Residual Block · Global Average Pooling · 1x1 Convolution · Kaiming Initialization · Convolution · Dense Connections · Residual Connection