Masked Clustering Prediction for Unsupervised Point Cloud Pre-training

Bin Ren; Xiaoshui Huang; Mengyuan Liu; Hong Liu; Fabio Poiesi; Nicu Sebe; Guofeng Mei

arXiv:2508.08910·cs.CV·August 13, 2025

Masked Clustering Prediction for Unsupervised Point Cloud Pre-training

Bin Ren, Xiaoshui Huang, Mengyuan Liu, Hong Liu, Fabio Poiesi, Nicu Sebe, Guofeng Mei

PDF

1 Video

TL;DR

MaskClu is an unsupervised pre-training method for vision transformers on 3D point clouds that combines masked point modeling, clustering, and contrastive learning to capture dense semantic features, improving performance across various 3D tasks.

Contribution

It introduces MaskClu, a novel approach integrating clustering-based reconstruction and contrastive learning for better semantic understanding in point cloud ViTs.

Findings

01

Outperforms existing methods on part and semantic segmentation

02

Achieves state-of-the-art results in 3D object detection

03

Enhances semantic feature richness in point cloud representations

Abstract

Vision transformers (ViTs) have recently been widely applied to 3D point cloud understanding, with masked autoencoding as the predominant pre-training paradigm. However, the challenge of learning dense and informative semantic features from point clouds via standard ViTs remains underexplored. We propose MaskClu, a novel unsupervised pre-training method for ViTs on 3D point clouds that integrates masked point modeling with clustering-based learning. MaskClu is designed to reconstruct both cluster assignments and cluster centers from masked point clouds, thus encouraging the model to capture dense semantic information. Additionally, we introduce a global contrastive learning mechanism that enhances instance-level feature learning by contrasting different masked views of the same point cloud. By jointly optimizing these complementary objectives, i.e., dense semantic reconstruction, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Masked Clustering Prediction for Unsupervised Point Cloud Pre-training· underline