Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders
Yaohua Zha, Huizhen Ji, Jinmin Li, Rongsheng Li, Tao Dai, Bin Chen,, Zhi Wang, Shu-Tao Xia

TL;DR
This paper introduces Point-FEMAE, a novel point cloud autoencoder that enhances 3D feature compactness and efficiency by combining global and local branches with a shared Transformer encoder, outperforming existing methods.
Contribution
It proposes a simple yet effective point feature enhancement autoencoder with a local module and shared encoder, improving 3D representation compactness and pre-training efficiency.
Findings
Outperforms baseline Point-MAE by over 5% on ScanObjectNN variants.
Significantly improves pre-training efficiency compared to cross-modal methods.
Achieves state-of-the-art results in downstream 3D recognition tasks.
Abstract
Learning 3D representation plays a critical role in masked autoencoder (MAE) based pre-training methods for point cloud, including single-modal and cross-modal based MAE. Specifically, although cross-modal MAE methods learn strong 3D representations via the auxiliary of other modal knowledge, they often suffer from heavy computational burdens and heavily rely on massive cross-modal data pairs that are often unavailable, which hinders their applications in practice. Instead, single-modal methods with solely point clouds as input are preferred in real applications due to their simplicity and efficiency. However, such methods easily suffer from limited 3D representations with global random mask input. To learn compact 3D representations, we propose a simple yet effective Point Feature Enhancement Masked Autoencoders (Point-FEMAE), which mainly consists of a global branch and a local branch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage · Optical measurement and interference techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Adam · Convolution · Byte Pair Encoding
