ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding
Guangda Ji, Silvan Weder, Francis Engelmann, Marc Pollefeys, Hermann, Blum

TL;DR
ARKit LabelMaker introduces a large-scale, densely annotated 3D dataset for indoor scene understanding, significantly enhancing neural network training and achieving state-of-the-art segmentation results.
Contribution
The paper presents ARKit LabelMaker, a new extensive 3D dataset with dense semantic labels, enabling improved training and performance in indoor 3D scene understanding.
Findings
Achieves state-of-the-art 3D semantic segmentation scores on ScanNet and ScanNet200.
Training on the dataset improves accuracy across various architectures.
Notable gains on tail classes in semantic segmentation.
Abstract
Neural network performance scales with both model size and data volume, as shown in both language and image processing. This requires scaling-friendly architectures and large datasets. While transformers have been adapted for 3D vision, a `GPT-moment' remains elusive due to limited training data. We introduce ARKit LabelMaker, a large-scale real-world 3D dataset with dense semantic annotation that is more than three times larger than prior largest dataset. Specifically, we extend ARKitScenes with automatically generated dense 3D labels using an extended LabelMaker pipeline, tailored for large-scale pre-training. Training on our dataset improves accuracy across architectures, achieving state-of-the-art 3D semantic segmentation scores on ScanNet and ScanNet200, with notable gains on tail classes. Our code is available at https://labelmaker.org and our dataset at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · Advanced Neural Network Applications · Image Processing and 3D Reconstruction
