A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion
Kailai Sun, Zhou Yang, Qianchuan Zhao

TL;DR
This paper introduces a novel two-stage Transformer-based network utilizing masked autoencoder pre-training for indoor depth completion, significantly improving accuracy in complex indoor environments and benefiting 3D reconstruction tasks.
Contribution
It presents a new two-step Transformer network with self-supervised pre-training and token fusion decoder for enhanced indoor depth completion performance.
Findings
Achieves state-of-the-art results on Matterport3D dataset.
Effective in reconstructing full depth from RGB and incomplete depth images.
Validates the approach's usefulness in indoor 3D reconstruction.
Abstract
Depth images have a wide range of applications, such as 3D reconstruction, autonomous driving, augmented reality, robot navigation, and scene understanding. Commodity-grade depth cameras are hard to sense depth for bright, glossy, transparent, and distant surfaces. Although existing depth completion methods have achieved remarkable progress, their performance is limited when applied to complex indoor scenarios. To address these problems, we propose a two-step Transformer-based network for indoor depth completion. Unlike existing depth completion approaches, we adopt a self-supervision pre-training encoder based on the masked autoencoder to learn an effective latent representation for the missing depth value; then we propose a decoder based on a token fusion mechanism to complete (i.e., reconstruct) the full depth from the jointly RGB and incomplete depth image. Compared to the existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · Industrial Vision Systems and Defect Detection · Advanced Vision and Imaging
