Pixel-level Correspondence for Self-Supervised Learning from Video

Yash Sharma; Yi Zhu; Chris Russell; Thomas Brox

arXiv:2207.03866·cs.CV·July 11, 2022·1 cites

Pixel-level Correspondence for Self-Supervised Learning from Video

Yash Sharma, Yi Zhu, Chris Russell, Thomas Brox

PDF

Open Access

TL;DR

PiCo introduces a dense contrastive learning method from video data by tracking points with optical flow, enabling improved dense prediction tasks while maintaining image classification performance.

Contribution

The paper presents PiCo, a novel pixel-level correspondence method for self-supervised learning from video using optical flow for dense contrastive learning.

Findings

01

Outperforms existing self-supervised methods on dense prediction benchmarks

02

Maintains competitive performance on image classification tasks

03

Demonstrates effectiveness of pixel-level correspondence in video-based learning

Abstract

While self-supervised learning has enabled effective representation learning in the absence of labels, for vision, video remains a relatively untapped source of supervision. To address this, we propose Pixel-level Correspondence (PiCo), a method for dense contrastive learning from video. By tracking points with optical flow, we obtain a correspondence map which can be used to match local features at different points in time. We validate PiCo on standard benchmarks, outperforming self-supervised baselines on multiple dense prediction tasks, without compromising performance on image classification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Vision and Imaging

MethodsContrastive Learning · Dense Contrastive Learning