TL;DR
This paper introduces a fast, robust method for estimating relative camera pose using deep-learned monocular depth and a single affine correspondence, significantly speeding up large-scale structure-from-motion tasks.
Contribution
It presents a novel 1AC+D solver that combines deep depth and affine features for efficient pose estimation within 1-point RANSAC, enabling faster large-scale 3D reconstruction.
Findings
Achieves similar accuracy to traditional methods
Significantly reduces computation time for pose estimation
Effective in large-scale SfM pipelines
Abstract
We propose a new approach for combining deep-learned non-metric monocular depth with affine correspondences (ACs) to estimate the relative pose of two calibrated cameras from a single correspondence. Considering the depth information and affine features, two new constraints on the camera pose are derived. The proposed solver is usable within 1-point RANSAC approaches. Thus, the processing time of the robust estimation is linear in the number of correspondences and, therefore, orders of magnitude faster than by using traditional approaches. The proposed 1AC+D solver is tested both on synthetic data and on 110395 publicly available real image pairs where we used an off-the-shelf monocular depth network to provide up-to-scale depth per pixel. The proposed 1AC+D leads to similar accuracy as traditional approaches while being significantly faster. When solving large-scale problems, e.g.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
