Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
Renrui Zhang, Ziyu Guo, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao,, Hongsheng Li, Peng Gao

TL;DR
Point-M2AE introduces a multi-scale masked autoencoder framework for hierarchical self-supervised learning of 3D point clouds, achieving state-of-the-art results in various 3D recognition tasks.
Contribution
It proposes a novel multi-scale pyramid architecture with a masking strategy and local attention for effective 3D point cloud pre-training.
Findings
Achieves 92.9% accuracy on ModelNet40 with frozen encoder.
Surpasses state-of-the-art on ScanObjectNN with 86.43% accuracy.
Enhances performance in few-shot, part segmentation, and object detection tasks.
Abstract
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers. However, it still remains an open question on how to exploit masked autoencoding for learning 3D representations of irregular point clouds. In this paper, we propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds. Unlike the standard transformer in MAE, we modify the encoder and decoder into pyramid architectures to progressively model spatial geometries and capture both fine-grained and high-level semantics of 3D shapes. For the encoder that downsamples point tokens by stages, we design a multi-scale masking strategy to generate consistent visible regions across scales, and adopt a local spatial self-attention mechanism during fine-tuning to focus on neighboring patterns. By multi-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
MethodsMasked autoencoder · Support Vector Machine
