Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot   Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language   Models

Tianrun Chen; Chunan Yu; Jing Li; Jianqi Zhang; Lanyun Zhu; Deyi Ji,; Yong Zhang; Ying Zang; Zejian Li; Lingyun Sun

arXiv:2405.19326·cs.CV·May 30, 2024·2 cites

Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models

Tianrun Chen, Chunan Yu, Jing Li, Jianqi Zhang, Lanyun Zhu, Deyi Ji,, Yong Zhang, Ying Zang, Zejian Li, Lingyun Sun

PDF

Open Access

TL;DR

Reasoning3D introduces a zero-shot 3D part segmentation method leveraging large vision-language models, enabling fine-grained, context-aware object part understanding without extensive 3D training data.

Contribution

The paper presents a novel zero-shot 3D part segmentation approach using pre-trained 2D segmentation and language models, surpassing traditional category-specific methods.

Findings

01

Effective localization of 3D object parts based on textual queries

02

Generalizes well to articulated and scanned 3D objects

03

Rapid, training-free deployment for diverse applications

Abstract

In this paper, we introduce a new task: Zero-Shot 3D Reasoning Segmentation for parts searching and localization for objects, which is a new paradigm to 3D segmentation that transcends limitations for previous category-specific 3D semantic segmentation, 3D instance segmentation, and open-vocabulary 3D segmentation. We design a simple baseline method, Reasoning3D, with the capability to understand and execute complex commands for (fine-grained) segmenting specific parts for 3D meshes with contextual awareness and reasoned answers for interactive segmentation. Specifically, Reasoning3D leverages an off-the-shelf pre-trained 2D segmentation network, powered by Large Language Models (LLMs), to interpret user input queries in a zero-shot manner. Previous research have shown that extensive pre-training endows foundation models with prior world knowledge, enabling them to comprehend complex…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Handwritten Text Recognition Techniques