2D3D-MatchNet: Learning to Match Keypoints Across 2D Image and 3D Point Cloud
Mengdan Feng, Sixing Hu, Marcelo Ang, Gim Hee Lee

TL;DR
This paper introduces 2D3D-MatchNet, a deep learning architecture that learns to match 2D image keypoints with 3D point cloud keypoints, enabling direct 2D-3D correspondence for improved visual pose estimation.
Contribution
The paper presents a novel end-to-end deep network that jointly learns descriptors for 2D and 3D keypoints, facilitating direct matching between images and point clouds.
Findings
Successfully establishes 2D-3D correspondences from images and point clouds.
Creates the Oxford 2D-3D Patches dataset with ground truth for training.
Experimental results confirm the approach's feasibility.
Abstract
Large-scale point cloud generated from 3D sensors is more accurate than its image-based counterpart. However, it is seldom used in visual pose estimation due to the difficulty in obtaining 2D-3D image to point cloud correspondences. In this paper, we propose the 2D3D-MatchNet - an end-to-end deep network architecture to jointly learn the descriptors for 2D and 3D keypoint from image and point cloud, respectively. As a result, we are able to directly match and establish 2D-3D correspondences from the query image and 3D point cloud reference map for visual pose estimation. We create our Oxford 2D-3D Patches dataset from the Oxford Robotcar dataset with the ground truth camera poses and 2D-3D image to point cloud correspondences for training and testing the deep network. Experimental results verify the feasibility of our approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage · Human Pose and Action Recognition
