Understanding the Limitations of CNN-based Absolute Camera Pose Regression
Torsten Sattler, Qunjie Zhou, Marc Pollefeys, Laura Leal-Taixe

TL;DR
This paper develops a theoretical model to analyze CNN-based camera pose regression, revealing its limitations and showing it often underperforms compared to image retrieval methods, indicating the need for further research.
Contribution
The paper introduces a theoretical framework for understanding CNN-based pose regression and demonstrates its limitations through experiments, highlighting the gap with structure-based methods.
Findings
Pose regression often fails in complex scenes.
Current CNN approaches do not outperform simple image retrieval baselines.
Pose regression is more akin to pose approximation than precise estimation.
Abstract
Visual localization is the task of accurate camera pose estimation in a known scene. It is a key problem in computer vision and robotics, with applications including self-driving cars, Structure-from-Motion, SLAM, and Mixed Reality. Traditionally, the localization problem has been tackled using 3D geometry. Recently, end-to-end approaches based on convolutional neural networks have become popular. These methods learn to directly regress the camera pose from an input image. However, they do not achieve the same level of pose accuracy as 3D structure-based methods. To understand this behavior, we develop a theoretical model for camera pose regression. We use our model to predict failure cases for pose regression techniques and verify our predictions through experiments. We furthermore use our model to show that pose regression is more closely related to pose approximation via image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · 3D Surveying and Cultural Heritage
