Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation
Zihan Gao, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Wenping Ma,, Yuwei Guo, Shuyuan Yang

TL;DR
MaskField introduces a novel approach for 3D scene segmentation that efficiently leverages foundation models by decomposing mask and semantic features, avoiding complex regularization, and achieving faster convergence than previous methods.
Contribution
It proposes MaskField, a new method that decomposes mask and semantic feature distillation, improving efficiency and accuracy in 3D scene segmentation from 2D models.
Findings
Surpasses prior state-of-the-art methods in 3D segmentation accuracy.
Achieves remarkably fast convergence during training.
Naturally incorporates SAM segmented object shapes without extra regularization.
Abstract
Understanding 3D scenes is a crucial challenge in computer vision research with applications spanning multiple domains. Recent advancements in distilling 2D vision-language foundation models into neural fields, like NeRF and 3DGS, enable open-vocabulary segmentation of 3D scenes from 2D multi-view images without the need for precise 3D annotations. However, while effective, these methods typically rely on the per-pixel distillation of high-dimensional CLIP features, introducing ambiguity and necessitating complex regularization strategies, which adds inefficiency during training. This paper presents MaskField, which enables efficient 3D open-vocabulary segmentation with neural fields from a novel perspective. Unlike previous methods, MaskField decomposes the distillation of mask and semantic features from foundation models by formulating a mask feature field and queries. MaskField…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Industrial Vision Systems and Defect Detection · Image and Object Detection Techniques
MethodsContrastive Language-Image Pre-training · Segment Anything Model
