TL;DR
MV-RoMa is a multi-view dense matching model that improves 3D reconstruction by jointly estimating consistent correspondences across multiple images, outperforming existing methods in accuracy and density.
Contribution
The paper introduces MV-RoMa, a novel multi-view dense matching architecture that efficiently integrates pairwise matching results and refines correspondences for better 3D reconstruction.
Findings
Produces more reliable correspondences across views.
Generates denser and more accurate 3D reconstructions.
Outperforms existing sparse and dense matching methods on benchmarks.
Abstract
Establishing consistent correspondences across images is essential for 3D vision tasks such as structure-from-motion (SfM), yet most existing matchers operate in a pairwise manner, often producing fragmented and geometrically inconsistent tracks when their predictions are chained across views. We propose MV-RoMa, a multi-view dense matching model that jointly estimates dense correspondences from a source image to multiple co-visible targets. Specifically, we design an efficient model architecture which avoids high computational cost of full cross-attention for multi-view feature interaction: (i) multi-view encoder that leverages pair-wise matching results as a geometric prior, and (ii) multi-view matching refiner that refines correspondences using pixel-wise attention. Additionally, we propose a post-processing strategy that integrates our model's consistent multi-view correspondences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
