TL;DR
This paper presents a method for improving depth estimation in indoor scenes by leveraging visual-inertial SLAM data, surface normals, and gravity-based image warping to enhance accuracy over existing approaches.
Contribution
The authors introduce a novel approach that combines VI-SLAM point clouds, surface normal estimation, and gravity-guided image warping to improve depth completion in indoor environments.
Findings
Outperforms state-of-the-art methods on ScanNet and NYUv2 datasets.
Uses gravity estimates to improve surface normal and depth estimation accuracy.
Effectively compensates for low-density, noisy, and non-uniform point clouds from VI-SLAM.
Abstract
This paper addresses the problem of learning to complete a scene's depth from sparse depth points and images of indoor scenes. Specifically, we study the case in which the sparse depth is computed from a visual-inertial simultaneous localization and mapping (VI-SLAM) system. The resulting point cloud has low density, it is noisy, and has non-uniform spatial distribution, as compared to the input from active depth sensors, e.g., LiDAR or Kinect. Since the VI-SLAM produces point clouds only over textured areas, we compensate for the missing depth of the low-texture surfaces by leveraging their planar structures and their surface normals which is an important intermediate representation. The pre-trained surface normal network, however, suffers from large performance degradation when there is a significant difference in the viewing direction (especially the roll angle) of the test image as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
