MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked   Autoencoders

Xueying Jiang; Sheng Jin; Xiaoqin Zhang; Ling Shao; Shijian Lu

arXiv:2405.07696·cs.CV·October 16, 2024

MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders

Xueying Jiang, Sheng Jin, Xiaoqin Zhang, Ling Shao, Shijian Lu

PDF

Open Access 1 Video

TL;DR

MonoMAE introduces a depth-aware masked autoencoder approach for monocular 3D detection, effectively handling occlusions by masking and reconstructing object features, leading to improved accuracy and domain generalization.

Contribution

It proposes a novel depth-aware masking and lightweight query completion method to enhance 3D object detection from monocular images, especially under occlusion conditions.

Findings

01

Achieves superior detection performance on occluded and non-occluded objects.

02

Learns representations that generalize well across different domains.

03

Improves 3D localization and identification accuracy.

Abstract

Monocular 3D object detection aims for precise 3D localization and identification of objects from a single-view image. Despite its recent progress, it often struggles while handling pervasive object occlusions that tend to complicate and degrade the prediction of object dimensions, depths, and orientations. We design MonoMAE, a monocular 3D detector inspired by Masked Autoencoders that addresses the object occlusion issue by masking and reconstructing objects in the feature space. MonoMAE consists of two novel designs. The first is depth-aware masking that selectively masks certain parts of non-occluded object queries in the feature space for simulating occluded object queries for network training. It masks non-occluded object queries by balancing the masked and preserved query portions adaptively according to the depth information. The second is lightweight query completion that works…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders· slideslive

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Advanced Neural Network Applications · 3D Surveying and Cultural Heritage