TL;DR
RidgeSfM introduces a robust method for large-scale indoor scene reconstruction that jointly estimates dense depth maps and camera poses directly from sparse keypoints, outperforming existing SfM pipelines.
Contribution
The paper presents a novel approach that combines deep monocular depth prediction with a modified bundle adjustment to produce dense reconstructions from sparse matches.
Findings
Outperforms state-of-the-art large-scale SfM pipelines.
Capable of aligning hundreds of frames efficiently.
Produces dense depth maps directly from sparse keypoints.
Abstract
We consider the problem of simultaneously estimating a dense depth map and camera pose for a large set of images of an indoor scene. While classical SfM pipelines rely on a two-step approach where cameras are first estimated using a bundle adjustment in order to ground the ensuing multi-view stereo stage, both our poses and dense reconstructions are a direct output of an altered bundle adjuster. To this end, we parametrize each depth map with a linear combination of a limited number of basis "depth-planes" predicted in a monocular fashion by a deep net. Using a set of high-quality sparse keypoint matches, we optimize over the per-frame linear combinations of depth planes and camera poses to form a geometrically consistent cloud of keypoints. Although our bundle adjustment only considers sparse keypoints, the inferred linear coefficients of the basis planes immediately give us dense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
