Open-Vocabulary Semantic Part Segmentation of 3D Human

Keito Suzuki; Bang Du; Girish Krishnan; Kunyao Chen; Runfa Blark Li,; and Truong Nguyen

arXiv:2502.19782·cs.CV·February 28, 2025

Open-Vocabulary Semantic Part Segmentation of 3D Human

Keito Suzuki, Bang Du, Girish Krishnan, Kunyao Chen, Runfa Blark Li,, and Truong Nguyen

PDF

Open Access

TL;DR

This paper introduces a novel open-vocabulary 3D human part segmentation method that leverages vision-language models, specifically designed for fine-grained, zero-shot segmentation across various 3D representations, outperforming existing methods.

Contribution

The paper presents the first open-vocabulary segmentation approach for 3D humans, utilizing a new HumanCLIP model and a simple MaskFusion pipeline for efficient, accurate multi-view 3D part segmentation.

Findings

01

Outperforms state-of-the-art open-vocabulary 3D segmentation methods.

02

Effective across multiple 3D data formats including meshes, point clouds, and Gaussian Splatting.

03

Achieves high accuracy in fine-grained human part segmentation in zero-shot settings.

Abstract

3D part segmentation is still an open problem in the field of 3D vision and AR/VR. Due to limited 3D labeled data, traditional supervised segmentation methods fall short in generalizing to unseen shapes and categories. Recently, the advancement in vision-language models' zero-shot abilities has brought a surge in open-world 3D segmentation methods. While these methods show promising results for 3D scenes or objects, they do not generalize well to 3D humans. In this paper, we present the first open-vocabulary segmentation method capable of handling 3D human. Our framework can segment the human category into desired fine-grained parts based on the textual prompt. We design a simple segmentation pipeline, leveraging SAM to generate multi-view proposals in 2D and proposing a novel HumanCLIP model to create unified embeddings for visual and textual inputs. Compared with existing pre-trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Advanced Neural Network Applications

MethodsContrastive Language-Image Pre-training · Segment Anything Model