Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
Xumin Yu, Lulu Tang, Yongming Rao, Tiejun Huang, Jie Zhou, Jiwen Lu

TL;DR
Point-BERT introduces a BERT-inspired pre-training method for 3D point cloud Transformers using masked point modeling, significantly enhancing classification accuracy and transferability to new tasks.
Contribution
It proposes a novel Masked Point Modeling pre-training strategy with a point tokenizer, improving point cloud Transformer performance with fewer handcrafted designs.
Findings
Achieves 93.8% accuracy on ModelNet40
Surpasses existing models on ScanObjectNN
Enhances few-shot point cloud classification
Abstract
We present Point-BERT, a new paradigm for learning Transformers to generalize the concept of BERT to 3D point cloud. Inspired by BERT, we devise a Masked Point Modeling (MPM) task to pre-train point cloud Transformers. Specifically, we first divide a point cloud into several local point patches, and a point cloud Tokenizer with a discrete Variational AutoEncoder (dVAE) is designed to generate discrete point tokens containing meaningful local information. Then, we randomly mask out some patches of input point clouds and feed them into the backbone Transformers. The pre-training objective is to recover the original point tokens at the masked locations under the supervision of point tokens obtained by the Tokenizer. Extensive experiments demonstrate that the proposed BERT-style pre-training strategy significantly improves the performance of standard point cloud Transformers. Equipped with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · Human Pose and Action Recognition
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Weight Decay · Linear Warmup With Linear Decay · Absolute Position Encodings · WordPiece · Label Smoothing · Refunds@Expedia|||How do I get a full refund from Expedia?
