Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
Paul-Edouard Sarlin, Ajaykumar Unagar, M{\aa}ns Larsson, Hugo Germain,, Carl Toft, Viktor Larsson, Marc Pollefeys, Vincent Lepetit, Lars, Hammarstrand, Fredrik Kahl, Torsten Sattler

TL;DR
PixLoc is a scene-agnostic neural network that estimates accurate 6-DoF camera poses by aligning deep features, emphasizing robust feature learning over scene-specific geometric regression, leading to better generalization across scenes.
Contribution
We propose PixLoc, a novel approach that separates feature learning from geometric estimation, enabling robust, scene-agnostic camera localization through deep feature alignment.
Findings
PixLoc generalizes well to new scenes.
It improves localization accuracy with coarse priors.
The method refines keypoints and poses jointly.
Abstract
Camera pose estimation in known scenes is a 3D geometry task recently tackled by multiple learning algorithms. Many regress precise geometric quantities, like poses or 3D points, from an input image. This either fails to generalize to new viewpoints or ties the model parameters to a specific scene. In this paper, we go Back to the Feature: we argue that deep networks should focus on learning robust and invariant visual features, while the geometric estimation should be left to principled algorithms. We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model. Our approach is based on the direct alignment of multiscale deep features, casting camera localization as metric learning. PixLoc learns strong data priors by end-to-end training from pixels to pose and exhibits exceptional generalization to new scenes by separating model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPixLoc
