3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance
Xiaoxu Xu, Yitian Yuan, Jinlong Li, Qiudan Zhang, Zequn Jie, Lin Ma,, Hao Tang, Nicu Sebe, and Xu Wang

TL;DR
This paper introduces 3DSS-VLG, a novel weakly supervised 3D semantic segmentation method leveraging 2D vision-language models to improve feature alignment and supervision, achieving state-of-the-art results on S3DIS and ScanNet datasets.
Contribution
It is the first to utilize textual semantic information for weakly supervised 3D segmentation, combining image and text embeddings for enhanced supervision.
Findings
Achieves state-of-the-art performance on S3DIS and ScanNet datasets.
Demonstrates strong generalization capability across datasets.
Effectively aligns 3D embeddings with text and image spaces.
Abstract
In this paper, we propose 3DSS-VLG, a weakly supervised approach for 3D Semantic Segmentation with 2D Vision-Language Guidance, an alternative approach that a 3D model predicts dense-embedding for each point which is co-embedded with both the aligned image and text spaces from the 2D vision-language model. Specifically, our method exploits the superior generalization ability of the 2D vision-language models and proposes the Embeddings Soft-Guidance Stage to utilize it to implicitly align 3D embeddings and text embeddings. Moreover, we introduce the Embeddings Specialization Stage to purify the feature representation with the help of a given scene-level label, specifying a better feature supervised by the corresponding text embedding. Thus, the 3D model is able to gain informative supervisions both from the image embedding and text embedding, leading to competitive segmentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsALIGN
