Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani,, Deqing Sun, Ming-Hsuan Yang

TL;DR
This paper emphasizes the importance of geometry-awareness in semantic correspondence models, demonstrating that incorporating geometric information significantly improves performance in both zero-shot and supervised settings, and introduces a new challenging benchmark.
Contribution
It reveals the limitations of current foundation models in geometry understanding and proposes simple solutions to enhance semantic correspondence performance.
Findings
Achieved 65.4 [email protected] in zero-shot and 85.6 in supervised settings on SPair-71k.
Outperformed state-of-the-art by 5.5 and 11.0 percentage points respectively.
Constructed a new challenging benchmark from an animal pose dataset.
Abstract
While pre-trained large-scale vision models have shown significant promise for semantic correspondence, their features often struggle to grasp the geometry and orientation of instances. This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing. We show that incorporating this information can markedly enhance semantic correspondence performance with simple but effective solutions in both zero-shot and supervised settings. We also construct a new challenging benchmark for semantic correspondence built from an existing animal pose estimation dataset, for both pre-training validating models. Our method achieves a [email protected] score of 65.4 (zero-shot) and 85.6 (supervised) on the challenging SPair-71k dataset, outperforming the state of the art by 5.5p and 11.0p…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
