Masked Autoencoder for Self-Supervised Pre-training on Lidar Point   Clouds

Georg Hess; Johan Jaxing; Elias Svensson; David Hagerman; Christoffer; Petersson; Lennart Svensson

arXiv:2207.00531·cs.CV·March 10, 2023

Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds

Georg Hess, Johan Jaxing, Elias Svensson, David Hagerman, Christoffer, Petersson, Lennart Svensson

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Voxel-MAE, a masked autoencoder pretraining method for voxel-based 3D point cloud representations, improving automotive 3D object detection performance and reducing annotation needs.

Contribution

We develop Voxel-MAE, the first masked autoencoder tailored for sparse, variable-density voxel point clouds in automotive settings, enhancing detection accuracy and data efficiency.

Findings

01

Improves 3D object detection by 1.75 mAP points on nuScenes.

02

Reduces annotation requirement to 40% for comparable performance.

03

Demonstrates effectiveness of masked autoencoding in automotive point cloud pretraining.

Abstract

Masked autoencoding has become a successful pretraining paradigm for Transformer models for text, images, and, recently, point clouds. Raw automotive datasets are suitable candidates for self-supervised pre-training as they generally are cheap to collect compared to annotations for tasks like 3D object detection (OD). However, the development of masked autoencoders for point clouds has focused solely on synthetic and indoor data. Consequently, existing methods have tailored their representations and models toward small and dense point clouds with homogeneous point densities. In this work, we study masked autoencoding for point clouds in an automotive setting, which are sparse and for which the point density can vary drastically among objects in the same scene. To this end, we propose Voxel-MAE, a simple masked autoencoding pre-training scheme designed for voxel representations. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

georghess/voxel-mae
pytorchOfficial

Videos

Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds· youtube

Taxonomy

TopicsRemote Sensing and LiDAR Applications · Optical measurement and interference techniques · Advanced Optical Sensing Technologies

MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Residual Connection · Dense Connections · Byte Pair Encoding · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing