MoQuad: Motion-focused Quadruple Construction for Video Contrastive Learning
Yuan Liu, Jiacheng Chen, Hao Wu

TL;DR
This paper introduces MoQuad, a simple yet effective contrastive learning strategy that enhances motion feature learning in videos by disturbing appearance and motion in positive and negative samples, leading to superior downstream task performance.
Contribution
MoQuad is a novel quadruple construction method that improves motion feature learning without extra auxiliary tasks or explicit temporal modeling.
Findings
Achieves 93.7% accuracy on UCF-101 after 200 epochs pre-training.
Outperforms state-of-the-art methods on downstream video tasks.
Maintains a simple contrastive learning framework without multi-task learning.
Abstract
Learning effective motion features is an essential pursuit of video representation learning. This paper presents a simple yet effective sample construction strategy to boost the learning of motion features in video contrastive learning. The proposed method, dubbed Motion-focused Quadruple Construction (MoQuad), augments the instance discrimination by meticulously disturbing the appearance and motion of both the positive and negative samples to create a quadruple for each video instance, such that the model is encouraged to exploit motion information. Unlike recent approaches that create extra auxiliary tasks for learning motion features or apply explicit temporal modelling, our method keeps the simple and clean contrastive learning paradigm (i.e.,SimCLR) without multi-task learning or extra modelling. In addition, we design two extra training strategies by analyzing initial MoQuad…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Vision and Imaging
MethodsBitcoin Customer Service Number +1-833-534-1729 · Average Pooling · Batch Normalization · Residual Block · Global Average Pooling · 1x1 Convolution · Kaiming Initialization · Convolution · Dense Connections · Residual Connection
