Telling Left from Right: Identifying Geometry-Aware Semantic   Correspondence

Junyi Zhang; Charles Herrmann; Junhwa Hur; Eric Chen; Varun Jampani,; Deqing Sun; Ming-Hsuan Yang

arXiv:2311.17034·cs.CV·March 26, 2024·2 cites

Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani,, Deqing Sun, Ming-Hsuan Yang

PDF

Open Access 1 Repo

TL;DR

This paper emphasizes the importance of geometry-awareness in semantic correspondence models, demonstrating that incorporating geometric information significantly improves performance in both zero-shot and supervised settings, and introduces a new challenging benchmark.

Contribution

It reveals the limitations of current foundation models in geometry understanding and proposes simple solutions to enhance semantic correspondence performance.

Findings

01

Achieved 65.4 [email protected] in zero-shot and 85.6 in supervised settings on SPair-71k.

02

Outperformed state-of-the-art by 5.5 and 11.0 percentage points respectively.

03

Constructed a new challenging benchmark from an animal pose dataset.

Abstract

While pre-trained large-scale vision models have shown significant promise for semantic correspondence, their features often struggle to grasp the geometry and orientation of instances. This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing. We show that incorporating this information can markedly enhance semantic correspondence performance with simple but effective solutions in both zero-shot and supervised settings. We also construct a new challenging benchmark for semantic correspondence built from an existing animal pose estimation dataset, for both pre-training validating models. Our method achieves a [email protected] score of 65.4 (zero-shot) and 85.6 (supervised) on the challenging SPair-71k dataset, outperforming the state of the art by 5.5p and 11.0p…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Junyi42/geoaware-sc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning