CoSMo3D: Open-World Promptable 3D Semantic Part Segmentation through LLM-Guided Canonical Spatial Modeling

Li Jin; Weikai Chen; Yujie Wang; Yingda Yin; Zeyu Hu; Runze Zhang; Keyang Luo; Shengju Qian; Xin Wang; Xueying Qin

arXiv:2603.01205·cs.CV·March 3, 2026

CoSMo3D: Open-World Promptable 3D Semantic Part Segmentation through LLM-Guided Canonical Spatial Modeling

Li Jin, Weikai Chen, Yujie Wang, Yingda Yin, Zeyu Hu, Runze Zhang, Keyang Luo, Shengju Qian, Xin Wang, Xueying Qin

PDF

Open Access

TL;DR

This paper introduces CoSMo3D, a novel approach for 3D semantic segmentation that learns a canonical spatial framework guided by large language models, improving stability and transferability of part semantics in open-world scenarios.

Contribution

The paper proposes a data-driven canonical space perception method using LLM-guided alignment, creating a unified dataset and a dual-branch architecture to enhance 3D segmentation.

Findings

01

Achieves state-of-the-art performance in open-world 3D segmentation.

02

Creates a canonical dataset across 200 categories.

03

Improves stability and transferability of part semantics.

Abstract

Open-world promptable 3D semantic segmentation remains brittle as semantics are inferred in the input sensor coordinates. Yet, humans, in contrast, interpret parts via functional roles in a canonical space -- wings extend laterally, handles protrude to the side, and legs support from below. Psychophysical evidence shows that we mentally rotate objects into canonical frames to reveal these roles. To fill this gap, we propose \methodName{}, which attains canonical space perception by inducing a latent canonical reference frame learned directly from data. By construction, we create a unified canonical dataset through LLM-guided intra- and cross-category alignment, exposing canonical spatial regularities across 200 categories. By induction, we realize canonicality inside the model through a dual-branch architecture with canonical map anchoring and canonical box calibration, collapsing pose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Robot Manipulation and Learning · Human Pose and Action Recognition