3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding
Tatiana Zemskova, Dmitry Yudin

TL;DR
3DGraphLLM introduces a novel approach that combines semantic scene graphs with large language models to improve 3D scene understanding and natural language interaction in robotic applications.
Contribution
It is the first to explicitly incorporate semantic relationships into 3D scene graph representations for LLM-based 3D vision-language tasks.
Findings
Outperforms baselines without semantic relationships on multiple datasets
Enhances the quality of LLM responses in 3D scene understanding
Demonstrates the effectiveness of semantic-aware scene graphs in robotic applications
Abstract
A 3D scene graph represents a compact scene model by capturing both the objects present and the semantic relationships between them, making it a promising structure for robotic applications. To effectively interact with users, an embodied intelligent agent should be able to answer a wide range of natural language queries about the surrounding 3D environment. Large Language Models (LLMs) are beneficial solutions for user-robot interaction due to their natural language understanding and reasoning abilities. Recent methods for learning scene representations have shown that adapting these representations to the 3D world can significantly improve the quality of LLM responses. However, existing methods typically rely only on geometric information, such as object coordinates, and overlook the rich semantic relationships between objects. In this work, we propose 3DGraphLLM, a method for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Graph Theory and Algorithms · 3D Shape Modeling and Analysis
