Cross-View Completion Models are Zero-shot Correspondence Estimators

Honggyu An; Jinhyeon Kim; Seonghoon Park; Jaewoo Jung; Jisang Han,; Sunghwan Hong; Seungryong Kim

arXiv:2412.09072·cs.CV·December 13, 2024

Cross-View Completion Models are Zero-shot Correspondence Estimators

Honggyu An, Jinhyeon Kim, Seonghoon Park, Jaewoo Jung, Jisang Han,, Sunghwan Hong, Seungryong Kim

PDF

Open Access

TL;DR

This paper introduces a novel approach where cross-view completion models leverage cross-attention maps to effectively estimate correspondences in a zero-shot setting, improving tasks like geometric matching and depth estimation.

Contribution

It demonstrates that cross-attention maps in cross-view completion models better capture correspondence than other features, advancing zero-shot correspondence estimation.

Findings

01

Cross-attention maps outperform other features in correspondence tasks.

02

The method achieves state-of-the-art results in zero-shot matching.

03

Effective in geometric matching and multi-frame depth estimation.

Abstract

In this work, we explore new perspectives on cross-view completion learning by drawing an analogy to self-supervised correspondence learning. Through our analysis, we demonstrate that the cross-attention map within cross-view completion models captures correspondence more effectively than other correlations derived from encoder or decoder features. We verify the effectiveness of the cross-attention map by evaluating on both zero-shot matching and learning-based geometric matching and multi-frame depth estimation. Project page is available at https://cvlab-kaist.github.io/ZeroCo/.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference