TL;DR
This paper introduces a simplified yet highly effective approach to 6D camera localization that relies on a neural network to regress scene coordinates, achieving superior accuracy without needing a 3D scene model.
Contribution
Demonstrates that learning a single scene coordinate regression component suffices for accurate 6D camera pose estimation, simplifying the pipeline and improving generalization.
Findings
Outperforms state-of-the-art on indoor and outdoor datasets.
Does not require a 3D scene model during training.
Achieves high accuracy and robustness with an end-to-end trainable system.
Abstract
Popular research areas like autonomous driving and augmented reality have renewed the interest in image-based camera localization. In this work, we address the task of predicting the 6D camera pose from a single RGB image in a given 3D environment. With the advent of neural networks, previous works have either learned the entire camera localization process, or multiple components of a camera localization pipeline. Our key contribution is to demonstrate and explain that learning a single component of this pipeline is sufficient. This component is a fully convolutional neural network for densely regressing so-called scene coordinates, defining the correspondence between the input image and the 3D scene space. The neural network is prepended to a new end-to-end trainable pipeline. Our system is efficient, highly accurate, robust in training, and exhibits outstanding generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
