TL;DR
This paper introduces a hierarchical scene coordinate network that predicts pixel coordinates in a coarse-to-fine manner, improving the robustness and accuracy of single-image RGB localization in large and ambiguous environments.
Contribution
The work presents a novel hierarchical network architecture for scene coordinate regression, enabling scalable and accurate localization in large environments, outperforming previous regression-based methods.
Findings
Outperforms baseline regression networks in accuracy.
Achieves state-of-the-art results on multiple localization datasets.
Effective for both indoor and outdoor large-scale environments.
Abstract
Visual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image and a pre-built 3D model. Recently, deep neural networks have been exploited to regress the mapping between raw pixels and 3D coordinates in the scene, and thus the matching is implicitly performed by the forward pass through the network. However, in a large and ambiguous environment, learning such a regression task directly can be difficult for a single network. In this work, we present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image. The network consists of a series of output layers, each of them conditioned on the previous ones. The final output layer predicts the 3D coordinates and the others…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Hierarchical Scene Coordinate Classification and Regression for Visual Localization· youtube
