Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point   Modeling

Xumin Yu; Lulu Tang; Yongming Rao; Tiejun Huang; Jie Zhou; Jiwen Lu

arXiv:2111.14819·cs.CV·June 7, 2022·49 cites

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

Xumin Yu, Lulu Tang, Yongming Rao, Tiejun Huang, Jie Zhou, Jiwen Lu

PDF

Open Access 3 Repos

TL;DR

Point-BERT introduces a BERT-inspired pre-training method for 3D point cloud Transformers using masked point modeling, significantly enhancing classification accuracy and transferability to new tasks.

Contribution

It proposes a novel Masked Point Modeling pre-training strategy with a point tokenizer, improving point cloud Transformer performance with fewer handcrafted designs.

Findings

01

Achieves 93.8% accuracy on ModelNet40

02

Surpasses existing models on ScanObjectNN

03

Enhances few-shot point cloud classification

Abstract

We present Point-BERT, a new paradigm for learning Transformers to generalize the concept of BERT to 3D point cloud. Inspired by BERT, we devise a Masked Point Modeling (MPM) task to pre-train point cloud Transformers. Specifically, we first divide a point cloud into several local point patches, and a point cloud Tokenizer with a discrete Variational AutoEncoder (dVAE) is designed to generate discrete point tokens containing meaningful local information. Then, we randomly mask out some patches of input point clouds and feed them into the backbone Transformers. The pre-training objective is to recover the original point tokens at the masked locations under the supervision of point tokens obtained by the Tokenizer. Extensive experiments demonstrate that the proposed BERT-style pre-training strategy significantly improves the performance of standard point cloud Transformers. Equipped with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · Human Pose and Action Recognition

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Weight Decay · Linear Warmup With Linear Decay · Absolute Position Encodings · WordPiece · Label Smoothing · Refunds@Expedia|||How do I get a full refund from Expedia?