SegPoint: Segment Any Point Cloud via Large Language Model
Shuting He, Henghui Ding, Xudong Jiang, Bihan Wen

TL;DR
SegPoint leverages large language models to perform diverse 3D point cloud segmentation tasks, including implicit instruction understanding, within a unified framework, and introduces a new benchmark for evaluating such capabilities.
Contribution
This work introduces SegPoint, the first unified model capable of handling multiple 3D segmentation tasks using LLM reasoning, and presents Instruct3D, a new benchmark for implicit instruction-based segmentation.
Findings
Achieves competitive results on ScanRefer and ScanNet benchmarks.
Outperforms existing methods on the Instruct3D dataset.
Demonstrates the ability to understand complex implicit instructions.
Abstract
Despite significant progress in 3D point cloud segmentation, existing methods primarily address specific tasks and depend on explicit instructions to identify targets, lacking the capability to infer and understand implicit user intentions in a unified framework. In this work, we propose a model, called SegPoint, that leverages the reasoning capabilities of a multi-modal Large Language Model (LLM) to produce point-wise segmentation masks across a diverse range of tasks: 1) 3D instruction segmentation, 2) 3D referring segmentation, 3) 3D semantic segmentation, and 4) 3D open-vocabulary semantic segmentation. To advance 3D instruction research, we introduce a new benchmark, Instruct3D, designed to evaluate segmentation performance from complex and implicit instructional texts, featuring 2,565 point cloud-instruction pairs. Our experimental results demonstrate that SegPoint achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Semantic Web and Ontologies
