Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids

Wei Dong; Chris Choy; Charles Loop; Or Litany; Yuke Zhu; Anima; Anandkumar

arXiv:2305.13220·cs.CV·May 23, 2023·1 cites

Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids

Wei Dong, Chris Choy, Charles Loop, Or Litany, Yuke Zhu, Anima, Anandkumar

PDF

Open Access

TL;DR

This paper introduces a fast, sparse voxel grid-based method for monocular indoor scene reconstruction that avoids MLPs, enabling rapid training and rendering while maintaining high accuracy.

Contribution

It proposes a novel sparse and dense grid structure with a scale calibration algorithm and differentiable rendering for efficient monocular scene reconstruction.

Findings

01

10x faster training compared to neural implicit methods

02

100x faster rendering speed

03

Achieves comparable accuracy to state-of-the-art methods

Abstract

Indoor scene reconstruction from monocular images has long been sought after by augmented reality and robotics developers. Recent advances in neural field representations and monocular priors have led to remarkable results in scene-level surface reconstructions. The reliance on Multilayer Perceptrons (MLP), however, significantly limits speed in training and rendering. In this work, we propose to directly use signed distance function (SDF) in sparse voxel block grids for fast and accurate scene reconstruction without MLPs. Our globally sparse and locally dense data structure exploits surfaces' spatial sparsity, enables cache-friendly queries, and allows direct extensions to multi-modal data such as color and semantic labels. To apply this representation to monocular scene reconstruction, we develop a scale calibration algorithm for fast geometric initialization from monocular depth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Computer Graphics and Visualization Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings