Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds
Georg Hess, Johan Jaxing, Elias Svensson, David Hagerman, Christoffer, Petersson, Lennart Svensson

TL;DR
This paper introduces Voxel-MAE, a masked autoencoder pretraining method for voxel-based 3D point cloud representations, improving automotive 3D object detection performance and reducing annotation needs.
Contribution
We develop Voxel-MAE, the first masked autoencoder tailored for sparse, variable-density voxel point clouds in automotive settings, enhancing detection accuracy and data efficiency.
Findings
Improves 3D object detection by 1.75 mAP points on nuScenes.
Reduces annotation requirement to 40% for comparable performance.
Demonstrates effectiveness of masked autoencoding in automotive point cloud pretraining.
Abstract
Masked autoencoding has become a successful pretraining paradigm for Transformer models for text, images, and, recently, point clouds. Raw automotive datasets are suitable candidates for self-supervised pre-training as they generally are cheap to collect compared to annotations for tasks like 3D object detection (OD). However, the development of masked autoencoders for point clouds has focused solely on synthetic and indoor data. Consequently, existing methods have tailored their representations and models toward small and dense point clouds with homogeneous point densities. In this work, we study masked autoencoding for point clouds in an automotive setting, which are sparse and for which the point density can vary drastically among objects in the same scene. To this end, we propose Voxel-MAE, a simple masked autoencoding pre-training scheme designed for voxel representations. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds· youtube
Taxonomy
TopicsRemote Sensing and LiDAR Applications · Optical measurement and interference techniques · Advanced Optical Sensing Technologies
MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Residual Connection · Dense Connections · Byte Pair Encoding · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing
