Cycle-Contrast for Self-Supervised Video Representation Learning

Quan Kong; Wenpeng Wei; Ziwei Deng; Tomoaki Yoshinaga; Tomokazu; Murakami

arXiv:2010.14810·cs.CV·October 29, 2020·32 cites

Cycle-Contrast for Self-Supervised Video Representation Learning

Quan Kong, Wenpeng Wei, Ziwei Deng, Tomoaki Yoshinaga, Tomokazu, Murakami

PDF

Open Access 1 Video

TL;DR

Cycle-Contrastive Learning (CCL) is a self-supervised approach that learns video representations by establishing correspondences across frames and videos, improving performance on downstream video understanding tasks.

Contribution

The paper introduces a novel cycle-contrastive learning method that uniquely models relations between frames and videos within a single network architecture.

Findings

01

Outperforms previous methods in nearest neighbor retrieval.

02

Achieves higher accuracy in action recognition tasks.

03

Demonstrates effective transferability to downstream tasks.

Abstract

We present Cycle-Contrastive Learning (CCL), a novel self-supervised method for learning video representation. Following a nature that there is a belong and inclusion relation of video and its frames, CCL is designed to find correspondences across frames and videos considering the contrastive representation in their domains respectively. It is different from recent approaches that merely learn correspondences across frames or clips. In our method, the frame and video representations are learned from a single network based on an R3D architecture, with a shared non-linear transformation for embedding both frame and video features before the cycle-contrastive loss. We demonstrate that the video representation learned by CCL can be transferred well to downstream tasks of video understanding, outperforming previous methods in nearest neighbour retrieval and action recognition tasks on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Cycle-Contrast for Self-Supervised Video Representation Learning· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications