DDM-NET: End-to-end learning of keypoint feature Detection, Description and Matching for 3D localization
Xiangyu Xu, Li Guan, Enrique Dunn, Haoxiang Li, Gang Hua

TL;DR
This paper introduces DDM-NET, an end-to-end deep learning framework that jointly learns keypoint detection, description, and matching for 3D localization, outperforming existing methods.
Contribution
It presents a novel integrated network with self-supervised and weakly-supervised losses, enabling holistic training for improved 3D localization accuracy.
Findings
Outperforms traditional and weakly supervised methods on public datasets.
Enables end-to-end training of keypoint detection, description, and matching.
Achieves more accurate localization results.
Abstract
In this paper, we propose an end-to-end framework that jointly learns keypoint detection, descriptor representation and cross-frame matching for the task of image-based 3D localization. Prior art has tackled each of these components individually, purportedly aiming to alleviate difficulties in effectively train a holistic network. We design a self-supervised image warping correspondence loss for both feature detection and matching, a weakly-supervised epipolar constraints loss on relative camera pose learning, and a directional matching scheme that detects key-point features in a source image and performs coarse-to-fine correspondence search on the target image. We leverage this framework to enforce cycle consistency in our matching module. In addition, we propose a new loss to robustly handle both definite inlier/outlier matches and less-certain matches. The integration of these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition
