PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D   Object Detection

Anthony Chen; Kevin Zhang; Renrui Zhang; Zihan Wang; Yuheng Lu,; Yandong Guo; Shanghang Zhang

arXiv:2303.08129·cs.CV·March 15, 2023·5 cites

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection

Anthony Chen, Kevin Zhang, Renrui Zhang, Zihan Wang, Yuheng Lu,, Yandong Guo, Shanghang Zhang

PDF

Open Access 1 Repo

TL;DR

PiMAE is a novel self-supervised pre-training framework that enhances 3D object detection by promoting interaction between point cloud and RGB image modalities through innovative masking, cross-modal alignment, and shared decoding strategies.

Contribution

The paper introduces PiMAE, a new multi-modality masked autoencoder framework that significantly improves 3D and 2D detection performance by fostering cross-modal interactions.

Findings

01

Improves 3D detectors by 2.9%

02

Enhances 2D detectors by 6.7%

03

Boosts few-shot classifiers by 2.4%

Abstract

Masked Autoencoders learn strong visual representations and achieve state-of-the-art results in several independent modalities, yet very few works have addressed their capabilities in multi-modality settings. In this work, we focus on point cloud and RGB image data, two modalities that are often presented together in the real world, and explore their meaningful interactions. To improve upon the cross-modal synergy in existing works, we propose PiMAE, a self-supervised pre-training framework that promotes 3D and 2D interaction through three aspects. Specifically, we first notice the importance of masking strategies between the two sources and utilize a projection module to complementarily align the mask and visible tokens of the two modalities. Then, we utilize a well-crafted two-branch MAE pipeline with a novel shared decoder to promote cross-modality interaction in the mask tokens.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

blvlab/pimae
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Surveying and Cultural Heritage · Advanced Neural Network Applications · 3D Shape Modeling and Analysis

MethodsMasked autoencoder · ALIGN