Semantic-Aware Fine-Grained Correspondence
Yingdong Hu, Renhao Wang, Kaifeng Zhang, Yang Gao

TL;DR
This paper introduces a semantic-aware self-supervised approach for fine-grained visual correspondence, combining image-level and pixel-level learning to improve performance on tasks like segmentation and tracking.
Contribution
It proposes a novel method that leverages semantic information through self-supervised learning to enhance fine-grained correspondence matching.
Findings
Outperforms previous self-supervised methods on multiple tasks
Effectively combines semantic and low-level features for better results
Achieves state-of-the-art performance on video object segmentation and tracking
Abstract
Establishing visual correspondence across images is a challenging and essential task. Recently, an influx of self-supervised methods have been proposed to better learn representations for visual correspondence. However, we find that these methods often fail to leverage semantic information and over-rely on the matching of low-level features. In contrast, human vision is capable of distinguishing between distinct objects as a pretext to tracking. Inspired by this paradigm, we propose to learn semantic-aware fine-grained correspondence. Firstly, we demonstrate that semantic correspondence is implicitly available through a rich set of image-level self-supervised methods. We further design a pixel-level self-supervised learning objective which specifically targets fine-grained correspondence. For downstream tasks, we fuse these two kinds of complementary correspondence representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
