TL;DR
V2-SAM is a novel framework that extends SAM2 for cross-view object correspondence by integrating geometry-aware and appearance-guided prompts, achieving state-of-the-art results across multiple datasets.
Contribution
The paper introduces V2-SAM, combining prompt generators and a multi-expert system with cyclic consistency for improved cross-view object correspondence.
Findings
Achieves new state-of-the-art on Ego-Exo4D, DAVIS-2017, and HANDAL-X datasets.
Effectively combines geometry-aware and appearance-guided prompts.
Demonstrates the benefit of adaptive expert selection via cyclic consistency.
Abstract
Cross-view object correspondence, exemplified by the representative task of ego-exo object correspondence, aims to establish consistent associations of the same object across different viewpoints (e.g., egocentric and exocentric). This task poses significant challenges due to drastic viewpoint and appearance variations, making existing segmentation models, such as SAM2, difficult to apply directly. To address this, we present V2-SAM, a unified cross-view object correspondence framework that adapts SAM2 from single-view segmentation to cross-view correspondence through two complementary prompt generators. Specifically, the Cross-View Anchor Prompt Generator (V2-Anchor), built upon DINOv3 features, establishes geometry-aware correspondences and, for the first time, enables coordinate-based prompting for SAM2 in cross-view scenarios, while the Cross-View Visual Prompt Generator (V2-Visual)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
