Reason3D: Searching and Reasoning 3D Segmentation via Large Language   Model

Kuan-Chih Huang; Xiangtai Li; Lu Qi; Shuicheng Yan; Ming-Hsuan Yang

arXiv:2405.17427·cs.CV·February 11, 2025·2 cites

Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model

Kuan-Chih Huang, Xiangtai Li, Lu Qi, Shuicheng Yan, Ming-Hsuan Yang

PDF

Open Access 1 Repo

TL;DR

Reason3D leverages large language models to understand and segment 3D scenes by combining textual reasoning with dense segmentation masks, advancing 3D scene understanding capabilities.

Contribution

This work introduces Reason3D, a novel LLM-based framework that integrates hierarchical mask decoding for detailed 3D segmentation and reasoning tasks.

Findings

01

Effective 3D segmentation on ScanNet and Matterport3D datasets.

02

Enables hierarchical searching and detailed question answering.

03

Demonstrates strong performance in 3D reasoning tasks.

Abstract

Recent advancements in multimodal large language models (LLMs) have demonstrated significant potential across various domains, particularly in concept reasoning. However, their applications in understanding 3D environments remain limited, primarily offering textual or numerical outputs without generating dense, informative segmentation masks. This paper introduces Reason3D, a novel LLM designed for comprehensive 3D understanding. Reason3D processes point cloud data and text prompts to produce textual responses and segmentation masks, enabling advanced tasks such as 3D reasoning segmentation, hierarchical searching, express referring, and question answering with detailed mask outputs. We propose a hierarchical mask decoder that employs a coarse-to-fine approach to segment objects within expansive scenes. It begins with a coarse location estimation, followed by object mask estimation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kuanchihhuang/reason3d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques