TL;DR
PoseNet introduces a real-time convolutional neural network that accurately estimates 6-DOF camera pose from a single RGB image, operating efficiently indoors and outdoors without complex optimization.
Contribution
The paper presents a novel end-to-end CNN approach for 6-DOF relocalization that is fast, robust, and requires minimal training data, outperforming traditional methods in challenging conditions.
Findings
Achieves 2m and 6° accuracy outdoors, 0.5m and 10° indoors
Operates in real time at 5ms per frame
Robust to lighting, motion blur, and camera variations
Abstract
We present a robust and real-time monocular six degree of freedom relocalization system. Our system trains a convolutional neural network to regress the 6-DOF camera pose from a single RGB image in an end-to-end manner with no need of additional engineering or graph optimisation. The algorithm can operate indoors and outdoors in real time, taking 5ms per frame to compute. It obtains approximately 2m and 6 degree accuracy for large scale outdoor scenes and 0.5m and 10 degree accuracy indoors. This is achieved using an efficient 23 layer deep convnet, demonstrating that convnets can be used to solve complicated out of image plane regression problems. This was made possible by leveraging transfer learning from large scale classification data. We show the convnet localizes from high level features and is robust to difficult lighting, motion blur and different camera intrinsics where point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
