Controllable Augmentations for Video Representation Learning

Rui Qian; Weiyao Lin; John See; Dian Li

arXiv:2203.16632·cs.CV·April 4, 2022·1 cites

Controllable Augmentations for Video Representation Learning

Rui Qian, Weiyao Lin, John See, Dian Li

PDF

Open Access

TL;DR

This paper introduces a controllable augmentation framework for self-supervised video representation learning, improving temporal modeling and generalization by combining local and global information with contrastive learning.

Contribution

It proposes a novel controllable augmentation approach that jointly leverages local clips and global videos to enhance temporal structure learning in videos.

Findings

01

Outperforms existing methods on three video benchmarks.

02

Achieves more accurate temporal dynamics modeling.

03

Enhances generalization by mutual information minimization.

Abstract

This paper focuses on self-supervised video representation learning. Most existing approaches follow the contrastive learning pipeline to construct positive and negative pairs by sampling different clips. However, this formulation tends to bias to static background and have difficulty establishing global temporal structures. The major reason is that the positive pairs, i.e., different clips sampled from the same video, have limited temporal receptive field, and usually share similar background but differ in motions. To address these problems, we propose a framework to jointly utilize local clips and global videos to learn from detailed region-level correspondence as well as general long-term temporal relations. Based on a set of controllable augmentations, we achieve accurate appearance and motion pattern alignment through soft spatio-temporal region contrast. Our formulation is able to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Vision and Imaging

MethodsContrastive Learning