RelMobNet: End-to-end relative camera pose estimation using a robust two-stage training
Praveen Kumar Rajendran, Sumit Mishra, Luiz Felipe Vecchietti, Dongsoo, Har

TL;DR
This paper introduces RelMobNet, an end-to-end siamese network for relative camera pose estimation that improves accuracy and generalization through a novel two-stage training process, outperforming existing CNN-based methods.
Contribution
The paper proposes a new two-stage training approach for a siamese network that enhances translation accuracy and generalization in relative pose estimation without relying on camera parameters.
Findings
Improves translation vector estimation by up to 52.27% on certain scenes.
Demonstrates better generalization across different scene styles using GAN-based augmentation.
Provides qualitative analysis of epipolar lines aligning with ground truth poses.
Abstract
Relative camera pose estimation, i.e. estimating the translation and rotation vectors using a pair of images taken in different locations, is an important part of systems in augmented reality and robotics. In this paper, we present an end-to-end relative camera pose estimation network using a siamese architecture that is independent of camera parameters. The network is trained using the Cambridge Landmarks data with four individual scene datasets and a dataset combining the four scenes. To improve generalization, we propose a novel two-stage training that alleviates the need of a hyperparameter to balance the translation and rotation loss scale. The proposed method is compared with one-stage training CNN-based methods such as RPNet and RCPNet and demonstrate that the proposed model improves translation vector estimation by 16.11%, 28.88%, and 52.27% on the Kings College, Old Hospital,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
MethodsSiamese Network
