TL;DR
RPNet is an end-to-end deep neural network that estimates full relative camera poses directly from image pairs, outperforming traditional methods especially on challenging images, and is the first to recover complete translation vectors.
Contribution
RPNet introduces a novel deep learning approach for full translation vector recovery in relative pose estimation, eliminating the need for camera intrinsic/extrinsic data.
Findings
RPNet achieves more accurate pose estimates than traditional methods.
RPNet performs well on challenging images with repetitive textures or low texture.
RPNet successfully recovers full translation vectors in relative pose estimation.
Abstract
This paper addresses the task of relative camera pose estimation from raw image pixels, by means of deep neural networks. The proposed RPNet network takes pairs of images as input and directly infers the relative poses, without the need of camera intrinsic/extrinsic. While state-of-the-art systems based on SIFT + RANSAC, are able to recover the translation vector only up to scale, RPNet is trained to produce the full translation vector, in an end-to-end way. Experimental results on the Cambridge Landmark dataset show very promising results regarding the recovery of the full translation vector. They also show that RPNet produces more accurate and more stable results than traditional approaches, especially for hard images (repetitive textures, textureless images, etc). To the best of our knowledge, RPNet is the first attempt to recover full translation vectors in relative pose estimation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
