TL;DR
This paper introduces a method to refine monocular depth maps by integrating multi-view data and differentiable rendering, resulting in more accurate and consistent 3D reconstructions suitable for complex indoor scenes.
Contribution
It presents a novel analysis-by-synthesis optimization framework that refines monocular depth estimates using multi-view consistency and differentiable rendering techniques.
Findings
Outperforms existing multi-view depth reconstruction methods on challenging datasets.
Produces detailed, high-quality, view-consistent depth maps.
Effective in indoor scenarios with complex geometries.
Abstract
Accurate depth estimation is at the core of many applications in computer graphics, vision, and robotics. Current state-of-the-art monocular depth estimators, trained on extensive datasets, generalize well but lack 3D consistency needed for many applications. In this paper, we combine the strength of those generalizing monocular depth estimation techniques with multi-view data by framing this as an analysis-by-synthesis optimization problem to lift and refine such relative depth maps to accurate error-free depth maps. After an initial global scale estimation through structure-from-motion point clouds, we further refine the depth map through optimization enforcing multi-view consistency via photometric and geometric losses with differentiable rendering of the meshed depth map. In a two-stage optimization, scaling is further refined first, and afterwards artifacts and errors in the depth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
