COS3D: Collaborative Open-Vocabulary 3D Segmentation

Runsong Zhu; Ka-Hei Hui; Zhengzhe Liu; Qianyi Wu; Weiliang Tang; Shi Qiu; Pheng-Ann Heng; Chi-Wing Fu

arXiv:2510.20238·cs.CV·October 24, 2025

COS3D: Collaborative Open-Vocabulary 3D Segmentation

Runsong Zhu, Ka-Hei Hui, Zhengzhe Liu, Qianyi Wu, Weiliang Tang, Shi Qiu, Pheng-Ann Heng, Chi-Wing Fu

PDF

TL;DR

COS3D introduces a collaborative framework for open-vocabulary 3D segmentation that effectively combines language and segmentation cues, achieving superior performance and broad applicability in 3D understanding tasks.

Contribution

The paper proposes a novel collaborative field concept and a two-stage training strategy to enhance open-vocabulary 3D segmentation performance.

Findings

01

Outperforms existing methods on benchmark datasets

02

Demonstrates versatility in applications like image-based 3D segmentation and robotics

03

Achieves high-quality prompt-segmentation inference through adaptive refinement

Abstract

Open-vocabulary 3D segmentation is a fundamental yet challenging task, requiring a mutual understanding of both segmentation and language. However, existing Gaussian-splatting-based methods rely either on a single 3D language field, leading to inferior segmentation, or on pre-computed class-agnostic segmentations, suffering from error accumulation. To address these limitations, we present COS3D, a new collaborative prompt-segmentation framework that contributes to effectively integrating complementary language and segmentation cues throughout its entire pipeline. We first introduce the new concept of collaborative field, comprising an instance field and a language field, as the cornerstone for collaboration. During training, to effectively construct the collaborative field, our key idea is to capture the intrinsic relationship between the instance field and language field, through a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.