Learning Correspondence from the Cycle-Consistency of Time
Xiaolong Wang, Allan Jabri, Alexei A. Efros

TL;DR
This paper presents a self-supervised approach that leverages cycle-consistency in time to learn visual correspondence from unlabeled videos, enabling effective tracking and matching across various tasks without labeled data.
Contribution
It introduces a novel self-supervised training method using cycle-consistency in time to learn versatile visual representations for correspondence tasks.
Findings
Outperforms previous self-supervised methods
Performs competitively with supervised methods
Effective across multiple visual correspondence tasks
Abstract
We introduce a self-supervised method for learning visual correspondence from unlabeled video. The main idea is to use cycle-consistency in time as free supervisory signal for learning visual representations from scratch. At training time, our model learns a feature map representation to be useful for performing cycle-consistent tracking. At test time, we use the acquired representation to find nearest neighbors across space and time. We demonstrate the generalizability of the representation -- without finetuning -- across a range of visual correspondence tasks, including video object segmentation, keypoint tracking, and optical flow. Our approach outperforms previous self-supervised methods and performs competitively with strongly supervised methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
AI Learns Tracking People In Videos· youtube
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
