How to improve CNN-based 6-DoF camera pose estimation
Soroush Seifi, Tinne Tuytelaars

TL;DR
This paper enhances CNN-based 6-DoF camera pose estimation by exploring dataset characteristics, data augmentation, and LSTM integration to improve accuracy and robustness over existing methods.
Contribution
It introduces modifications such as emphasizing field-of-view, a data augmentation scheme, and LSTM integration to improve PoseNet's accuracy for monocular camera pose estimation.
Findings
Improved pose accuracy with combined modifications.
Field-of-view is more critical than image resolution.
LSTM integration enhances temporal consistency.
Abstract
Convolutional neural networks (CNNs) and transfer learning have recently been used for 6 degrees of freedom (6-DoF) camera pose estimation. While they do not reach the same accuracy as visual SLAM-based approaches and are restricted to a specific environment, they excel in robustness and can be applied even to a single image. In this paper, we study PoseNet [1] and investigate modifications based on datasets' characteristics to improve the accuracy of the pose estimates. In particular, we emphasize the importance of field-of-view over image resolution; we present a data augmentation scheme to reduce overfitting; we study the effect of Long-Short-Term-Memory (LSTM) cells. Lastly, we combine these modifications and improve PoseNet's performance for monocular CNN based camera pose regression.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
