End-to-end learning of keypoint detector and descriptor for pose invariant 3D matching
Georgios Georgakis, Srikrishna Karanam, Ziyan Wu, Jan Ernst, and Jana Kosecka

TL;DR
This paper introduces an end-to-end learning framework for jointly detecting keypoints and extracting descriptors from 3D scans, improving 3D matching accuracy without requiring separate annotations.
Contribution
It presents a novel joint learning approach for keypoint detection and description in 3D data, optimized directly for matching tasks, unlike previous separate or image-focused methods.
Findings
Significant improvement over state-of-the-art methods on benchmark datasets.
Effective joint optimization of keypoint detection and description.
Automatic sampling of positive and negative examples based on pose labels.
Abstract
Finding correspondences between images or 3D scans is at the heart of many computer vision and image retrieval applications and is often enabled by matching local keypoint descriptors. Various learning approaches have been applied in the past to different stages of the matching pipeline, considering detector, descriptor, or metric learning objectives. These objectives were typically addressed separately and most previous work has focused on image data. This paper proposes an end-to-end learning framework for keypoint detection and its representation (descriptor) for 3D depth maps or 3D scans, where the two can be jointly optimized towards task-specific objectives without a need for separate annotations. We employ a Siamese architecture augmented by a sampling layer and a novel score loss function which in turn affects the selection of region proposals. The positive and negative examples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
