Pixel-level Correspondence for Self-Supervised Learning from Video
Yash Sharma, Yi Zhu, Chris Russell, Thomas Brox

TL;DR
PiCo introduces a dense contrastive learning method from video data by tracking points with optical flow, enabling improved dense prediction tasks while maintaining image classification performance.
Contribution
The paper presents PiCo, a novel pixel-level correspondence method for self-supervised learning from video using optical flow for dense contrastive learning.
Findings
Outperforms existing self-supervised methods on dense prediction benchmarks
Maintains competitive performance on image classification tasks
Demonstrates effectiveness of pixel-level correspondence in video-based learning
Abstract
While self-supervised learning has enabled effective representation learning in the absence of labels, for vision, video remains a relatively untapped source of supervision. To address this, we propose Pixel-level Correspondence (PiCo), a method for dense contrastive learning from video. By tracking points with optical flow, we obtain a correspondence map which can be used to match local features at different points in time. We validate PiCo on standard benchmarks, outperforming self-supervised baselines on multiple dense prediction tasks, without compromising performance on image classification.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Vision and Imaging
MethodsContrastive Learning · Dense Contrastive Learning
