2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds
Minhao Li, Zheng Qin, Zhirui Gao, Renjiao Yi, Chenyang Zhu, Yulan Guo,, Kai Xu

TL;DR
The paper introduces 2D3D-MATR, a detection-free transformer-based method for accurate registration between images and point clouds, overcoming cross-modality challenges with a coarse-to-fine matching approach.
Contribution
It proposes a novel detection-free registration method using transformers and multi-scale patch matching, improving robustness and accuracy over existing methods.
Findings
Outperforms P2-Net by 20% in inlier ratio
Achieves over 10% higher registration recall
Demonstrates robustness on public benchmarks
Abstract
The commonly adopted detect-then-match approach to registration finds difficulties in the cross-modality cases due to the incompatible keypoint detection and inconsistent feature description. We propose, 2D3D-MATR, a detection-free method for accurate and robust registration between images and point clouds. Our method adopts a coarse-to-fine pipeline where it first computes coarse correspondences between downsampled patches of the input image and the point cloud and then extends them to form dense correspondences between pixels and points within the patch region. The coarse-level patch matching is based on transformer which jointly learns global contextual constraints with self-attention and cross-modality correlations with cross-attention. To resolve the scale ambiguity in patch matching, we construct a multi-scale pyramid for each image patch and learn to find for each point patch the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
