MARCO: Navigating the Unseen Space of Semantic Correspondence

Claudia Cuttano; Gabriele Trivigno; Carlo Masone; Stefan Roth

arXiv:2604.18267·cs.CV·April 21, 2026

MARCO: Navigating the Unseen Space of Semantic Correspondence

Claudia Cuttano, Gabriele Trivigno, Carlo Masone, Stefan Roth

PDF

1 Repo

TL;DR

MARCO is a new model that improves semantic correspondence by enhancing generalization and localization, outperforming previous models while being smaller and faster, with a novel training framework that leverages sparse supervision.

Contribution

Introduces MARCO, a unified model with a novel training framework that improves semantic correspondence generalization and localization, outperforming prior models.

Findings

01

Sets new state-of-the-art on SPair-71k, AP-10K, and PF-PASCAL.

02

Achieves +8.9 [email protected] at fine-grained localization.

03

Demonstrates strong generalization to unseen keypoints and categories.

Abstract

Recent advances in semantic correspondence rely on dual-encoder architectures, combining DINOv2 with diffusion backbones. While accurate, these billion-parameter models generalize poorly beyond training keypoints, revealing a gap between benchmark performance and real-world usability, where queried points rarely match those seen during training. Building upon DINOv2, we introduce MARCO, a unified model for generalizable correspondence driven by a novel training framework that enhances both fine-grained localization and semantic generalization. By coupling a coarse-to-fine objective that refines spatial precision with a self-distillation framework, which expands sparse supervision beyond annotated regions, our approach transforms a handful of keypoints into dense, semantically coherent correspondences. MARCO sets a new state of the art on SPair-71k, AP-10K, and PF-PASCAL, with gains that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

visinf/MARCO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.