Multimodal 3D Reasoning Segmentation with Complex Scenes
Xueying Jiang, Lewei Lu, Ling Shao, Shijian Lu

TL;DR
This paper introduces a new 3D reasoning segmentation task, a large-scale benchmark ReasonSeg3D, and a novel network MORE3D to improve understanding of complex multi-object 3D scenes with spatial relations and detailed explanations.
Contribution
It proposes a new 3D reasoning segmentation task, creates the ReasonSeg3D benchmark, and develops the MORE3D network for enhanced multi-object scene understanding.
Findings
MORE3D outperforms existing methods in reasoning and segmentation accuracy.
ReasonSeg3D provides a comprehensive platform for 3D reasoning research.
The approach effectively captures spatial relations and detailed explanations in complex scenes.
Abstract
The recent development in multimodal learning has greatly advanced the research in 3D scene understanding in various real-world tasks such as embodied AI. However, most existing studies are facing two common challenges: 1) they are short of reasoning ability for interaction and interpretation of human intentions and 2) they focus on scenarios with single-category objects and over-simplified textual descriptions and neglect multi-object scenarios with complicated spatial relations among objects. We address the above challenges by proposing a 3D reasoning segmentation task for reasoning segmentation with multiple objects in scenes. The task allows producing 3D segmentation masks and detailed textual explanations as enriched by 3D spatial relations among objects. To this end, we create ReasonSeg3D, a large-scale and high-quality benchmark that integrates 3D segmentation masks and 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms
MethodsFocus
