When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Xianzheng Ma, Brandon Smart, Yash Bhalgat, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nie{\ss}ner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu

TL;DR
This survey reviews the rapid advancements in multi-modal large language models capable of understanding and interacting with 3D spatial data, highlighting methodologies, applications, and future research directions.
Contribution
It provides a comprehensive overview and meta-analysis of 3D-LLMs, emphasizing their capabilities, current progress, and challenges in integrating language with 3D spatial understanding.
Findings
Significant progress in 3D-LLMs for scene understanding and navigation
Identification of key data representations like point clouds and NeRFs
Highlighting the need for novel approaches to fully harness 3D-LLMs
Abstract
As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context learning, step-by-step reasoning, open-vocabulary capabilities, and extensive world knowledge, we underscore their potential to significantly advance spatial comprehension and interaction within embodied Artificial Intelligence (AI) systems. Our investigation spans various 3D data representations, from point clouds to Neural Radiance Fields (NeRFs). It examines their integration with LLMs for tasks such as 3D scene understanding, captioning, question-answering, and dialogue, as well as LLM-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
