Rethinking Self-supervised Correspondence Learning: A Video Frame-level   Similarity Perspective

Jiarui Xu; Xiaolong Wang

arXiv:2103.17263·cs.CV·October 15, 2021

Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective

Jiarui Xu, Xiaolong Wang

PDF

Open Access 5 Repos 1 Models

TL;DR

This paper introduces Video Frame-level Similarity (VFS) learning, a simple yet effective self-supervised approach that compares entire video frames to learn space-time correspondence, outperforming existing methods in tracking and segmentation tasks.

Contribution

The paper proposes a novel frame-level similarity learning method for correspondence, diverging from traditional patch or object-level approaches, inspired by contrastive learning success.

Findings

01

VFS surpasses state-of-the-art self-supervised methods in tracking and segmentation.

02

Frame-level similarity learning reveals new properties for image and video correspondence.

03

Detailed analysis shows what factors are crucial for effective VFS learning.

Abstract

Learning a good representation for space-time correspondence is the key for various computer vision tasks, including tracking object bounding boxes and performing video object pixel segmentation. To learn generalizable representation for correspondence in large-scale, a variety of self-supervised pretext tasks are proposed to explicitly perform object-level or patch-level similarity learning. Instead of following the previous literature, we propose to learn correspondence using Video Frame-level Similarity (VFS) learning, i.e, simply learning from comparing video frames. Our work is inspired by the recent success in image-level contrastive learning and similarity learning for visual recognition. Our hypothesis is that if the representation is good for recognition, it requires the convolutional features to find correspondence between similar objects or parts. Our experiments show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
niobures/mmaction2
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques

MethodsContrastive Learning