D2S: Representing sparse descriptors and 3D coordinates for camera relocalization
Bach-Thuan Bui, Huy-Hoang Bui, Dinh-Tuan Tran, and Joo-Ho Lee

TL;DR
D2S introduces a simple, cost-effective learning-based method for camera relocalization that uses a lightweight network to represent descriptors and scene coordinates, outperforming previous regression methods in various environments.
Contribution
The paper presents D2S, a novel approach that leverages a simple network with graph attention for efficient, scene-specific localization from a single RGB image, with improved generalization capabilities.
Findings
Outperforms previous regression-based methods in indoor and outdoor environments.
Effectively generalizes across day-night transitions and domain shifts.
Uses a lightweight model with selective attention for robust descriptor representation.
Abstract
State-of-the-art visual localization methods mostly rely on complex procedures to match local descriptors and 3D point clouds. However, these procedures can incur significant costs in terms of inference, storage, and updates over time. In this study, we propose a direct learning-based approach that utilizes a simple network named D2S to represent complex local descriptors and their scene coordinates. Our method is characterized by its simplicity and cost-effectiveness. It solely leverages a single RGB image for localization during the testing phase and only requires a lightweight model to encode a complex sparse scene. The proposed D2S employs a combination of a simple loss function and graph attention to selectively focus on robust descriptors while disregarding areas such as clouds, trees, and several dynamic objects. This selective attention enables D2S to effectively perform a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging
MethodsFocus
