Multi-Scale Neighborhood Occupancy Masked Autoencoder for   Self-Supervised Learning in LiDAR Point Clouds

Mohamed Abdelsamad; Michael Ulrich; Claudius Gl\"aser; Abhinav Valada

arXiv:2502.20316·cs.CV·February 28, 2025

Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds

Mohamed Abdelsamad, Michael Ulrich, Claudius Gl\"aser, Abhinav Valada

PDF

Open Access

TL;DR

This paper introduces NOMAE, a novel masked autoencoder for LiDAR point clouds that improves self-supervised learning by focusing on neighborhood occupancy and multi-scale features, achieving state-of-the-art results.

Contribution

The paper proposes NOMAE, a neighborhood occupancy masked autoencoder that effectively captures multi-scale features in LiDAR point clouds for self-supervised learning.

Findings

01

Sets new state-of-the-art on nuScenes and Waymo datasets.

02

Improves semantic segmentation and 3D detection performance.

03

Efficiently handles large empty regions in LiDAR data.

Abstract

Masked autoencoders (MAE) have shown tremendous potential for self-supervised learning (SSL) in vision and beyond. However, point clouds from LiDARs used in automated driving are particularly challenging for MAEs since large areas of the 3D volume are empty. Consequently, existing work suffers from leaking occupancy information into the decoder and has significant computational complexity, thereby limiting the SSL pre-training to only 2D bird's eye view encoders in practice. In this work, we propose the novel neighborhood occupancy MAE (NOMAE) that overcomes the aforementioned challenges by employing masked occupancy reconstruction only in the neighborhood of non-masked voxels. We incorporate voxel masking and occupancy reconstruction at multiple scales with our proposed hierarchical mask generation technique to capture features of objects of different sizes in the point cloud. NOMAEs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · 3D Shape Modeling and Analysis · Domain Adaptation and Few-Shot Learning

MethodsMasked autoencoder