GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds
Honghui Yang, Tong He, Jiaheng Liu, Hua Chen, Boxi Wu and, Binbin Lin, Xiaofei He, Wanli Ouyang

TL;DR
GD-MAE introduces a simple generative decoder for 3D point cloud pre-training that effectively restores masked geometric information, achieving state-of-the-art results with less latency and high robustness across large-scale benchmarks.
Contribution
The paper proposes a novel generative decoder paradigm for 3D MAE pre-training that simplifies the architecture and enhances performance and flexibility over previous complex methods.
Findings
Outperforms existing methods on Waymo, KITTI, and ONCE benchmarks.
Achieves comparable accuracy with only 20% labeled data on Waymo.
Reduces decoding latency by over 88% compared to traditional approaches.
Abstract
Despite the tremendous progress of Masked Autoencoders (MAE) in developing vision tasks such as image and video, exploring MAE in large-scale 3D point clouds remains challenging due to the inherent irregularity. In contrast to previous 3D MAE frameworks, which either design a complex decoder to infer masked information from maintained regions or adopt sophisticated masking strategies, we instead propose a much simpler paradigm. The core idea is to apply a \textbf{G}enerative \textbf{D}ecoder for MAE (GD-MAE) to automatically merges the surrounding context to restore the masked geometric knowledge in a hierarchical fusion manner. In doing so, our approach is free from introducing the heuristic design of decoders and enjoys the flexibility of exploring various masking strategies. The corresponding part costs less than \textbf{12\%} latency compared with conventional methods, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Advanced Neural Network Applications
MethodsMasked autoencoder
