Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency
Haiping Wu, Xiaolong Wang

TL;DR
This paper introduces a novel contrastive learning approach leveraging cross-video cycle-consistency to improve image representations, leading to better performance on various downstream tasks without requiring cross-video labels.
Contribution
It proposes a new contrastive learning method that exploits cross-video relations through cycle-consistency, enhancing high-level semantic representation learning.
Findings
Significant improvements on object tracking, classification, and action recognition tasks.
Effective use of cross-video relations without human-annotated labels.
Outperforms state-of-the-art contrastive learning methods.
Abstract
Recent works have advanced the performance of self-supervised representation learning by a large margin. The core among these methods is intra-image invariance learning. Two different transformations of one image instance are considered as a positive sample pair, where various tasks are designed to learn invariant representations by comparing the pair. Analogically, for video data, representations of frames from the same video are trained to be closer than frames from other videos, i.e. intra-video invariance. However, cross-video relation has barely been explored for visual representation learning. Unlike intra-video invariance, ground-truth labels of cross-video relation is usually unavailable without human labors. In this paper, we propose a novel contrastive learning method which explores the cross-video relation by using cycle-consistency for general image representation learning.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
MethodsContrastive Learning
