Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective
Jiarui Xu, Xiaolong Wang

TL;DR
This paper introduces Video Frame-level Similarity (VFS) learning, a simple yet effective self-supervised approach that compares entire video frames to learn space-time correspondence, outperforming existing methods in tracking and segmentation tasks.
Contribution
The paper proposes a novel frame-level similarity learning method for correspondence, diverging from traditional patch or object-level approaches, inspired by contrastive learning success.
Findings
VFS surpasses state-of-the-art self-supervised methods in tracking and segmentation.
Frame-level similarity learning reveals new properties for image and video correspondence.
Detailed analysis shows what factors are crucial for effective VFS learning.
Abstract
Learning a good representation for space-time correspondence is the key for various computer vision tasks, including tracking object bounding boxes and performing video object pixel segmentation. To learn generalizable representation for correspondence in large-scale, a variety of self-supervised pretext tasks are proposed to explicitly perform object-level or patch-level similarity learning. Instead of following the previous literature, we propose to learn correspondence using Video Frame-level Similarity (VFS) learning, i.e, simply learning from comparing video frames. Our work is inspired by the recent success in image-level contrastive learning and similarity learning for visual recognition. Our hypothesis is that if the representation is good for recognition, it requires the convolutional features to find correspondence between similar objects or parts. Our experiments show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques
MethodsContrastive Learning
