UniM$^2$AE: Multi-modal Masked Autoencoders with Unified 3D   Representation for 3D Perception in Autonomous Driving

Jian Zou; Tianyu Huang; Guanglei Yang; Zhenhua Guo; Tao Luo; Chun-Mei; Feng; Wangmeng Zuo

arXiv:2308.10421·cs.CV·August 26, 2024

UniM$^2$AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving

Jian Zou, Tianyu Huang, Guanglei Yang, Zhenhua Guo, Tao Luo, Chun-Mei, Feng, Wangmeng Zuo

PDF

Open Access 1 Repo

TL;DR

UniM$^2$AE introduces a multi-modal masked autoencoder framework that unifies image and LiDAR data into a 3D volume space, improving 3D perception tasks for autonomous driving through efficient multi-modal fusion.

Contribution

The paper proposes a novel multi-modal autoencoder with a unified 3D representation and an interactive module, enhancing multi-modal feature integration for autonomous driving perception tasks.

Findings

01

Improves 3D object detection by 1.2% NDS

02

Enhances BEV map segmentation by 6.5% mIoU

03

Demonstrates effective multi-modal fusion in autonomous driving

Abstract

Masked Autoencoders (MAE) play a pivotal role in learning potent representations, delivering outstanding results across various 3D perception tasks essential for autonomous driving. In real-world driving scenarios, it's commonplace to deploy multiple sensors for comprehensive environment perception. Despite integrating multi-modal features from these sensors can produce rich and powerful features, there is a noticeable challenge in MAE methods addressing this integration due to the substantial disparity between the different modalities. This research delves into multi-modal Masked Autoencoders tailored for a unified representation space in autonomous driving, aiming to pioneer a more efficient fusion of two distinct modalities. To intricately marry the semantics inherent in images with the geometric intricacies of LiDAR point clouds, we propose UniM $^{2}$ AE. This model stands as a potent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hollow-503/unim2ae
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Robotics and Sensor-Based Localization

MethodsMasked autoencoder