Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds
Mohamed Abdelsamad, Michael Ulrich, Claudius Gl\"aser, Abhinav Valada

TL;DR
This paper introduces NOMAE, a novel masked autoencoder for LiDAR point clouds that improves self-supervised learning by focusing on neighborhood occupancy and multi-scale features, achieving state-of-the-art results.
Contribution
The paper proposes NOMAE, a neighborhood occupancy masked autoencoder that effectively captures multi-scale features in LiDAR point clouds for self-supervised learning.
Findings
Sets new state-of-the-art on nuScenes and Waymo datasets.
Improves semantic segmentation and 3D detection performance.
Efficiently handles large empty regions in LiDAR data.
Abstract
Masked autoencoders (MAE) have shown tremendous potential for self-supervised learning (SSL) in vision and beyond. However, point clouds from LiDARs used in automated driving are particularly challenging for MAEs since large areas of the 3D volume are empty. Consequently, existing work suffers from leaking occupancy information into the decoder and has significant computational complexity, thereby limiting the SSL pre-training to only 2D bird's eye view encoders in practice. In this work, we propose the novel neighborhood occupancy MAE (NOMAE) that overcomes the aforementioned challenges by employing masked occupancy reconstruction only in the neighborhood of non-masked voxels. We incorporate voxel masking and occupancy reconstruction at multiple scales with our proposed hierarchical mask generation technique to capture features of objects of different sizes in the point cloud. NOMAEs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · 3D Shape Modeling and Analysis · Domain Adaptation and Few-Shot Learning
MethodsMasked autoencoder
