M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders
Qibo Qiu, Honghui Yang, Wenxiao Wang, Shun Zhang, Haiming Gao, Haochao, Ying, Wei Hua, Xiaofei He

TL;DR
M$^3$CS introduces a multi-task masked point modeling framework with learnable codebooks and siamese decoders, enhancing geometric and semantic feature learning for point cloud pre-training, leading to improved downstream task performance.
Contribution
The paper proposes a novel multi-task masked point modeling approach with learnable codebooks and siamese decoders to improve feature representation in point cloud pre-training.
Findings
Outperforms existing methods in classification tasks.
Achieves superior results in segmentation tasks.
Effectively captures both geometric details and semantic contexts.
Abstract
Masked point modeling has become a promising scheme of self-supervised pre-training for point clouds. Existing methods reconstruct either the original points or related features as the objective of pre-training. However, considering the diversity of downstream tasks, it is necessary for the model to have both low- and high-level representation modeling capabilities to capture geometric details and semantic contexts during pre-training. To this end, MCS is proposed to enable the model with the above abilities. Specifically, with masked point cloud as input, MCS introduces two decoders to predict masked representations and the original points simultaneously. While an extra decoder doubles parameters for the decoding process and may lead to overfitting, we propose siamese decoders to keep the amount of learnable parameters unchanged. Further, we propose an online codebook…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage · Image Processing and 3D Reconstruction
