TL;DR
This paper introduces an end-to-end trainable framework for camera pose estimation that combines learnable modules with geometric optimization, achieving accuracy and robustness comparable to traditional methods while improving generalization.
Contribution
The authors develop a fully trainable pipeline integrating detection, feature extraction, matching, and outlier rejection, optimized directly for geometric pose estimation.
Findings
Achieves pose estimation accuracy comparable to classic methods.
Improves generalizability to unseen datasets through end-to-end training.
Outperforms existing learning-based approaches in robustness and accuracy.
Abstract
Estimating relative camera poses from consecutive frames is a fundamental problem in visual odometry (VO) and simultaneous localization and mapping (SLAM), where classic methods consisting of hand-crafted features and sampling-based outlier rejection have been a dominant choice for over a decade. Although multiple works propose to replace these modules with learning-based counterparts, most have not yet been as accurate, robust and generalizable as conventional methods. In this paper, we design an end-to-end trainable framework consisting of learnable modules for detection, feature extraction, matching and outlier rejection, while directly optimizing for the geometric pose objective. We show both quantitatively and qualitatively that pose estimation performance may be achieved on par with the classic pipeline. Moreover, we are able to show by end-to-end training, the key components of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
