Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids
Wei Dong, Chris Choy, Charles Loop, Or Litany, Yuke Zhu, Anima, Anandkumar

TL;DR
This paper introduces a fast, sparse voxel grid-based method for monocular indoor scene reconstruction that avoids MLPs, enabling rapid training and rendering while maintaining high accuracy.
Contribution
It proposes a novel sparse and dense grid structure with a scale calibration algorithm and differentiable rendering for efficient monocular scene reconstruction.
Findings
10x faster training compared to neural implicit methods
100x faster rendering speed
Achieves comparable accuracy to state-of-the-art methods
Abstract
Indoor scene reconstruction from monocular images has long been sought after by augmented reality and robotics developers. Recent advances in neural field representations and monocular priors have led to remarkable results in scene-level surface reconstructions. The reliance on Multilayer Perceptrons (MLP), however, significantly limits speed in training and rendering. In this work, we propose to directly use signed distance function (SDF) in sparse voxel block grids for fast and accurate scene reconstruction without MLPs. Our globally sparse and locally dense data structure exploits surfaces' spatial sparsity, enables cache-friendly queries, and allows direct extensions to multi-modal data such as color and semantic labels. To apply this representation to monocular scene reconstruction, we develop a scale calibration algorithm for fast geometric initialization from monocular depth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Computer Graphics and Visualization Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
