Cross Pixel Optical Flow Similarity for Self-Supervised Learning
Aravindh Mahendran, James Thewlis, Andrea Vedaldi

TL;DR
This paper introduces a new self-supervised learning method that uses optical flow similarity to learn image representations without manual labels, achieving state-of-the-art results in various vision tasks.
Contribution
It proposes a simplified approach to use motion cues for self-supervised learning by matching pixel similarity with optical flow, avoiding complex flow prediction.
Findings
Achieves state-of-the-art in self-supervised pretraining for segmentation.
Demonstrates competitive results in image classification and detection.
Simplifies previous motion-based self-supervision methods.
Abstract
We propose a novel method for learning convolutional neural image representations without manual supervision. We use motion cues in the form of optical flow, to supervise representations of static images. The obvious approach of training a network to predict flow from a single image can be needlessly difficult due to intrinsic ambiguities in this prediction task. We instead propose a much simpler learning goal: embed pixels such that the similarity between their embeddings matches that between their optical flow vectors. At test time, the learned deep network can be used without access to video or flow information and transferred to tasks such as image classification, detection, and segmentation. Our method, which significantly simplifies previous attempts at using motion for self-supervision, achieves state-of-the-art results in self-supervision using motion cues, competitive results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
