A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion

Kailai Sun; Zhou Yang; Qianchuan Zhao

arXiv:2406.09792·cs.CV·June 17, 2024

A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion

Kailai Sun, Zhou Yang, Qianchuan Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel two-stage Transformer-based network utilizing masked autoencoder pre-training for indoor depth completion, significantly improving accuracy in complex indoor environments and benefiting 3D reconstruction tasks.

Contribution

It presents a new two-step Transformer network with self-supervised pre-training and token fusion decoder for enhanced indoor depth completion performance.

Findings

01

Achieves state-of-the-art results on Matterport3D dataset.

02

Effective in reconstructing full depth from RGB and incomplete depth images.

03

Validates the approach's usefulness in indoor 3D reconstruction.

Abstract

Depth images have a wide range of applications, such as 3D reconstruction, autonomous driving, augmented reality, robot navigation, and scene understanding. Commodity-grade depth cameras are hard to sense depth for bright, glossy, transparent, and distant surfaces. Although existing depth completion methods have achieved remarkable progress, their performance is limited when applied to complex indoor scenarios. To address these problems, we propose a two-step Transformer-based network for indoor depth completion. Unlike existing depth completion approaches, we adopt a self-supervision pre-training encoder based on the masked autoencoder to learn an effective latent representation for the missing depth value; then we propose a decoder based on a token fusion mechanism to complete (i.e., reconstruct) the full depth from the jointly RGB and incomplete depth image. Compared to the existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kailaisun/indoor-depth-completion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Surveying and Cultural Heritage · Industrial Vision Systems and Defect Detection · Advanced Vision and Imaging