How to improve CNN-based 6-DoF camera pose estimation

Soroush Seifi; Tinne Tuytelaars

arXiv:1909.10312·cs.CV·December 2, 2019

How to improve CNN-based 6-DoF camera pose estimation

Soroush Seifi, Tinne Tuytelaars

PDF

TL;DR

This paper enhances CNN-based 6-DoF camera pose estimation by exploring dataset characteristics, data augmentation, and LSTM integration to improve accuracy and robustness over existing methods.

Contribution

It introduces modifications such as emphasizing field-of-view, a data augmentation scheme, and LSTM integration to improve PoseNet's accuracy for monocular camera pose estimation.

Findings

01

Improved pose accuracy with combined modifications.

02

Field-of-view is more critical than image resolution.

03

LSTM integration enhances temporal consistency.

Abstract

Convolutional neural networks (CNNs) and transfer learning have recently been used for 6 degrees of freedom (6-DoF) camera pose estimation. While they do not reach the same accuracy as visual SLAM-based approaches and are restricted to a specific environment, they excel in robustness and can be applied even to a single image. In this paper, we study PoseNet [1] and investigate modifications based on datasets' characteristics to improve the accuracy of the pose estimates. In particular, we emphasize the importance of field-of-view over image resolution; we present a data augmentation scheme to reduce overfitting; we study the effect of Long-Short-Term-Memory (LSTM) cells. Lastly, we combine these modifications and improve PoseNet's performance for monocular CNN based camera pose regression.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.