Towards Compact 3D Representations via Point Feature Enhancement Masked   Autoencoders

Yaohua Zha; Huizhen Ji; Jinmin Li; Rongsheng Li; Tao Dai; Bin Chen,; Zhi Wang; Shu-Tao Xia

arXiv:2312.10726·cs.CV·December 19, 2023·1 cites

Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders

Yaohua Zha, Huizhen Ji, Jinmin Li, Rongsheng Li, Tao Dai, Bin Chen,, Zhi Wang, Shu-Tao Xia

PDF

Open Access 1 Repo

TL;DR

This paper introduces Point-FEMAE, a novel point cloud autoencoder that enhances 3D feature compactness and efficiency by combining global and local branches with a shared Transformer encoder, outperforming existing methods.

Contribution

It proposes a simple yet effective point feature enhancement autoencoder with a local module and shared encoder, improving 3D representation compactness and pre-training efficiency.

Findings

01

Outperforms baseline Point-MAE by over 5% on ScanObjectNN variants.

02

Significantly improves pre-training efficiency compared to cross-modal methods.

03

Achieves state-of-the-art results in downstream 3D recognition tasks.

Abstract

Learning 3D representation plays a critical role in masked autoencoder (MAE) based pre-training methods for point cloud, including single-modal and cross-modal based MAE. Specifically, although cross-modal MAE methods learn strong 3D representations via the auxiliary of other modal knowledge, they often suffer from heavy computational burdens and heavily rely on massive cross-modal data pairs that are often unavailable, which hinders their applications in practice. Instead, single-modal methods with solely point clouds as input are preferred in real applications due to their simplicity and efficiency. However, such methods easily suffer from limited 3D representations with global random mask input. To learn compact 3D representations, we propose a simple yet effective Point Feature Enhancement Masked Autoencoders (Point-FEMAE), which mainly consists of a global branch and a local branch…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zyh16143998882/aaai24-pointfemae
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage · Optical measurement and interference techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Adam · Convolution · Byte Pair Encoding