Jamais Vu: Exposing the Generalization Gap in Supervised Semantic Correspondence
Octave Mariotti, Zhipeng Du, Yash Bhalgat, Oisin Mac Aodha, Hakan Bilen

TL;DR
This paper exposes the limitations of supervised semantic correspondence methods in generalizing beyond sparse keypoints and proposes a novel 3D lifting approach to improve dense correspondence learning.
Contribution
It introduces a new method that lifts 2D keypoints into a canonical 3D space using monocular depth estimation, enhancing generalization in semantic correspondence tasks.
Findings
Our model outperforms supervised baselines on unseen keypoints.
Unsupervised methods outperform supervised ones when generalizing across datasets.
The approach captures object geometry without explicit 3D supervision.
Abstract
Semantic correspondence (SC) aims to establish semantically meaningful matches across different instances of an object category. We illustrate how recent supervised SC methods remain limited in their ability to generalize beyond sparsely annotated training keypoints, effectively acting as keypoint detectors. To address this, we propose a novel approach for learning dense correspondences by lifting 2D keypoints into a canonical 3D space using monocular depth estimation. Our method constructs a continuous canonical manifold that captures object geometry without requiring explicit 3D supervision or camera annotations. Additionally, we introduce SPair-U, an extension of SPair-71k with novel keypoint annotations, to better assess generalization. Experiments not only demonstrate that our model significantly outperforms supervised baselines on unseen keypoints, highlighting its effectiveness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
Topics3D Shape Modeling and Analysis · Multimodal Machine Learning Applications · Human Pose and Action Recognition
