GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation

Ken Deng; Yunhan Yang; Jingxiang Sun; Xihui Liu; Yebin Liu; Ding Liang; Yan-Pei Cao

arXiv:2508.14036·cs.CV·August 28, 2025

GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation

Ken Deng, Yunhan Yang, Jingxiang Sun, Xihui Liu, Yebin Liu, Ding Liang, Yan-Pei Cao

PDF

TL;DR

GeoSAM2 introduces a novel 3D part segmentation framework that uses multi-view 2D mask prediction guided by simple prompts, achieving state-of-the-art results without extensive training or labels.

Contribution

The paper presents GeoSAM2, a prompt-controllable 3D segmentation method that leverages SAM2 with multi-view 2D prompts, enabling fine-grained, interpretable, and efficient part segmentation.

Findings

01

Achieves state-of-the-art class-agnostic performance on PartObjaverse-Tiny and PartNetE.

02

Outperforms both optimization-based and coarse feedforward approaches.

03

Enables explicit, spatially grounded control without full 3D labels.

Abstract

We introduce GeoSAM2, a prompt-controllable framework for 3D part segmentation that casts the task as multi-view 2D mask prediction. Given a textureless object, we render normal and point maps from predefined viewpoints and accept simple 2D prompts - clicks or boxes - to guide part selection. These prompts are processed by a shared SAM2 backbone augmented with LoRA and residual geometry fusion, enabling view-specific reasoning while preserving pretrained priors. The predicted masks are back-projected to the object and aggregated across views. Our method enables fine-grained, part-specific control without requiring text prompts, per-shape optimization, or full 3D labels. In contrast to global clustering or scale-based methods, prompts are explicit, spatially grounded, and interpretable. We achieve state-of-the-art class-agnostic performance on PartObjaverse-Tiny and PartNetE,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.