Dual Contrastive Learning for Spatio-temporal Representation

Shuangrui Ding; Rui Qian; Hongkai Xiong

arXiv:2207.05340·cs.CV·July 13, 2022

Dual Contrastive Learning for Spatio-temporal Representation

Shuangrui Ding, Rui Qian, Hongkai Xiong

PDF

TL;DR

This paper introduces DCLR, a dual contrastive learning approach that decouples static scene and dynamic motion features to improve self-supervised spatio-temporal video representation learning.

Contribution

The paper proposes a novel dual contrastive formulation that decouples static and dynamic features, addressing background bias in video contrastive learning.

Findings

01

Achieves state-of-the-art performance on UCF-101, HMDB-51, and Diving-48 datasets.

02

Effectively encodes static and dynamic features into RGB representations.

03

Demonstrates improved discrimination of motion patterns over background scenes.

Abstract

Contrastive learning has shown promising potential in self-supervised spatio-temporal representation learning. Most works naively sample different clips to construct positive and negative pairs. However, we observe that this formulation inclines the model towards the background scene bias. The underlying reasons are twofold. First, the scene difference is usually more noticeable and easier to discriminate than the motion difference. Second, the clips sampled from the same video often share similar backgrounds but have distinct motions. Simply regarding them as positive pairs will draw the model to the static background rather than the motion pattern. To tackle this challenge, this paper presents a novel dual contrastive formulation. Concretely, we decouple the input RGB video sequence into two complementary modes, static scene and dynamic motion. Then, the original RGB features are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.