MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis
Yaqian Liang, Shanshan Zhao, Baosheng Yu, Jing Zhang, and Fazhi He

TL;DR
MeshMAE introduces a self-supervised pre-training approach for 3D mesh data using masked autoencoders with Transformers, significantly improving performance on classification and segmentation tasks.
Contribution
The paper adapts Vision Transformer to 3D mesh data and proposes MeshMAE, a masked autoencoder framework for self-supervised learning on meshes.
Findings
Achieves state-of-the-art results on mesh classification.
Demonstrates effective mesh segmentation performance.
Validates key design choices through ablation studies.
Abstract
Recently, self-supervised pre-training has advanced Vision Transformers on various tasks w.r.t. different data modalities, e.g., image and 3D point cloud data. In this paper, we explore this learning paradigm for 3D mesh data analysis based on Transformers. Since applying Transformer architectures to new modalities is usually non-trivial, we first adapt Vision Transformer to 3D mesh data processing, i.e., Mesh Transformer. In specific, we divide a mesh into several non-overlapping local patches with each containing the same number of faces and use the 3D position of each patch's center point to form positional embeddings. Inspired by MAE, we explore how pre-training on 3D mesh data with the Transformer-based structure benefits downstream 3D mesh analysis tasks. We first randomly mask some patches of the mesh and feed the corrupted mesh into Mesh Transformers. Then, through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Remote Sensing and LiDAR Applications · Advanced Neural Network Applications
MethodsAttention Is All You Need · Masked autoencoder · Linear Layer · Absolute Position Encodings · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Adam · Label Smoothing · Dense Connections
