RoMa v2: Harder Better Faster Denser Feature Matching
Johan Edstedt, David Nordstr\"om, Yushan Zhang, Georg B\"okman, Jonathan Astermark, Viktor Larsson, Anders Heyden, Fredrik Kahl, M{\aa}rten Wadenb\"ack, Michael Felsberg

TL;DR
RoMa v2 introduces a series of architectural and training improvements for dense feature matching, achieving state-of-the-art accuracy and robustness in complex real-world scenarios while enhancing speed and memory efficiency.
Contribution
The paper presents a novel matching architecture, a new loss function, a two-stage pipeline, and integration with DINOv3 to significantly improve dense feature matching performance.
Findings
Sets new state-of-the-art accuracy in dense matching
Improves robustness to complex real-world scenarios
Reduces training and refinement memory usage
Abstract
Dense feature matching aims to estimate all correspondences between two images of a 3D scene and has recently been established as the gold-standard due to its high accuracy and robustness. However, existing dense matchers still fail or perform poorly for many hard real-world scenarios, and high-precision models are often slow, limiting their applicability. In this paper, we attack these weaknesses on a wide front through a series of systematic improvements that together yield a significantly better model. In particular, we construct a novel matching architecture and loss, which, combined with a curated diverse training distribution, enables our model to solve many complex matching tasks. We further make training faster through a decoupled two-stage matching-then-refinement pipeline, and at the same time, significantly reduce refinement memory usage through a custom CUDA kernel. Finally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
