Multi-Modal Masked Pre-Training for Monocular Panoramic Depth Completion

Zhiqiang Yan; Xiang Li; Kun Wang; Zhenyu Zhang; Jun Li and; Jian Yang

arXiv:2203.09855·cs.CV·July 13, 2022

Multi-Modal Masked Pre-Training for Monocular Panoramic Depth Completion

Zhiqiang Yan, Xiang Li, Kun Wang, Zhenyu Zhang, Jun Li and, Jian Yang

PDF

1 Repo

TL;DR

This paper introduces M^3PT, a multi-modal masked pre-training approach for panoramic depth completion, significantly improving dense depth recovery from sparse data and RGB images.

Contribution

It is the first to apply masked pre-training to a multi-modal vision task, enhancing panoramic depth completion performance without changing network architecture.

Findings

01

Achieves up to 51.7% reduction in MRE

02

Improves RMSE by 26.2% over baselines

03

Effective across three panoramic datasets

Abstract

In this paper, we formulate a potentially valuable panoramic depth completion (PDC) task as panoramic 3D cameras often produce 360{\deg} depth with missing data in complex scenes. Its goal is to recover dense panoramic depths from raw sparse ones and panoramic RGB images. To deal with the PDC task, we train a deep network that takes both depth and image as inputs for the dense panoramic depth recovery. However, it needs to face a challenging optimization problem of the network parameters due to its non-convex objective function. To address this problem, we propose a simple yet effective approach termed M{^3}PT: multi-modal masked pre-training. Specifically, during pre-training, we simultaneously cover up patches of the panoramic RGB image and sparse depth by shared random mask, then reconstruct the sparse depth in the masked regions. To our best knowledge, it is the first time that we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anonymoustbd/mmmpt
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMasked autoencoder