Gromov Wasserstein Optimal Transport for Semantic Correspondences
Francis Snelgar, Stephen Gould, Ming Xu, Liang Zheng, Akshay Asthana

TL;DR
This paper introduces a Gromov Wasserstein optimal transport method to improve semantic correspondence in images, achieving higher efficiency and competitive accuracy compared to existing ensemble-based approaches that combine features from large foundation models.
Contribution
The authors replace standard nearest neighbor matching with a Gromov Wasserstein optimal transport algorithm, significantly boosting performance and efficiency in semantic correspondence tasks.
Findings
Boosts DINOv2 baseline performance
Competitive with state-of-the-art methods using SD features
Achieves 5-10x efficiency improvement
Abstract
Establishing correspondences between image pairs is a long studied problem in computer vision. With recent large-scale foundation models showing strong zero-shot performance on downstream tasks including classification and segmentation, there has been interest in using the internal feature maps of these models for the semantic correspondence task. Recent works observe that features from DINOv2 and Stable Diffusion (SD) are complementary, the former producing accurate but sparse correspondences, while the latter produces spatially consistent correspondences. As a result, current state-of-the-art methods for semantic correspondence involve combining features from both models in an ensemble. While the performance of these methods is impressive, they are computationally expensive, requiring evaluating feature maps from large-scale foundation models. In this work we take a different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
