Grounding Image Matching in 3D with MASt3R
Vincent Leroy, Yohann Cabon, J\'er\^ome Revaud

TL;DR
This paper introduces MASt3R, a novel 3D-based image matching framework that enhances robustness and accuracy by augmenting a Transformer-based reconstruction model with dense features and a fast reciprocal matching scheme, significantly outperforming existing methods.
Contribution
The paper proposes MASt3R, a 3D image matching approach that combines dense local features with a reciprocal matching scheme, improving accuracy and speed over prior 3D matching methods.
Findings
Outperforms state-of-the-art on multiple matching tasks.
Achieves 30% absolute improvement in VCRE AUC on Map-free localization.
Provides a theoretically guaranteed fast matching scheme.
Abstract
Image Matching is a core component of all best-performing algorithms and pipelines in 3D vision. Yet despite matching being fundamentally a 3D problem, intrinsically linked to camera pose and scene geometry, it is typically treated as a 2D problem. This makes sense as the goal of matching is to establish correspondences between 2D pixel fields, but also seems like a potentially hazardous choice. In this work, we take a different stance and propose to cast matching as a 3D task with DUSt3R, a recent and powerful 3D reconstruction framework based on Transformers. Based on pointmaps regression, this method displayed impressive robustness in matching views with extreme viewpoint changes, yet with limited accuracy. We aim here to improve the matching capabilities of such an approach while preserving its robustness. We thus propose to augment the DUSt3R network with a new head that outputs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications
