COS3D: Collaborative Open-Vocabulary 3D Segmentation
Runsong Zhu, Ka-Hei Hui, Zhengzhe Liu, Qianyi Wu, Weiliang Tang, Shi Qiu, Pheng-Ann Heng, Chi-Wing Fu

TL;DR
COS3D introduces a collaborative framework for open-vocabulary 3D segmentation that effectively combines language and segmentation cues, achieving superior performance and broad applicability in 3D understanding tasks.
Contribution
The paper proposes a novel collaborative field concept and a two-stage training strategy to enhance open-vocabulary 3D segmentation performance.
Findings
Outperforms existing methods on benchmark datasets
Demonstrates versatility in applications like image-based 3D segmentation and robotics
Achieves high-quality prompt-segmentation inference through adaptive refinement
Abstract
Open-vocabulary 3D segmentation is a fundamental yet challenging task, requiring a mutual understanding of both segmentation and language. However, existing Gaussian-splatting-based methods rely either on a single 3D language field, leading to inferior segmentation, or on pre-computed class-agnostic segmentations, suffering from error accumulation. To address these limitations, we present COS3D, a new collaborative prompt-segmentation framework that contributes to effectively integrating complementary language and segmentation cues throughout its entire pipeline. We first introduce the new concept of collaborative field, comprising an instance field and a language field, as the cornerstone for collaboration. During training, to effectively construct the collaborative field, our key idea is to capture the intrinsic relationship between the instance field and language field, through a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
