TL;DR
Shape is a self-supervised 3D geometry foundation model for industrial CAD analysis that learns dense surface mesh embeddings for accurate, explainable, and generalizable 3D representations.
Contribution
It introduces a novel self-supervised model combining a structured 3D latent grid, multi-scale geometry-aware tokenizer, and transformer for CAD surface understanding.
Findings
Achieves 98.1% top-1 retrieval accuracy on CAD meshes.
Pretraining on 61,052 CAD meshes yields high reconstruction quality.
Per-dimension normalization is critical for model performance.
Abstract
Industrial CAD workflows require robust, generalizable 3D geometric representations supporting accuracy and explainability. We introduce Shape, a self-supervised foundation model converting surface meshes into dense per-token embeddings. Shape combines a structured 3D latent grid, a multi-scale geometry-aware tokenizer (MAGNO) with cross-attention, and a transformer processor using grouped-query attention and RMSNorm. A learned reconstruction prior enables per-region attribution for explainable predictions. Pretraining uses masked-token reconstruction of normalized geometry statistics and multi-resolution contrastive consistency. The 10.9M-parameter backbone is pretrained on 61,052 CAD meshes from Thingi10K, MFCAD, and Fusion360. On a held-out split of 2,983 meshes, Shape achieves reconstruction R2 = 0.729 and 98.1% top-1 retrieval under the Wang-Isola protocol, with near-zero…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
