Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy
Vinit Mehta, Charu Sharma, Karthick Thiyagarajan

TL;DR
This review explores how integrating Large Language Models with 3D vision technologies can significantly advance robotic perception, enabling more intelligent, context-aware, and autonomous robotic sensing systems through multimodal and next-generation techniques.
Contribution
It provides a comprehensive analysis of current methodologies, applications, datasets, and challenges at the intersection of LLMs and 3D vision for robotics, highlighting future research directions.
Findings
Advances in scene understanding and text-to-3D generation.
Development of multimodal LLMs integrating touch, auditory, and thermal data.
Identification of key challenges like real-time processing and cross-modal alignment.
Abstract
With the rapid advancement of artificial intelligence and robotics, the integration of Large Language Models (LLMs) with 3D vision is emerging as a transformative approach to enhancing robotic sensing technologies. This convergence enables machines to perceive, reason and interact with complex environments through natural language and spatial understanding, bridging the gap between linguistic intelligence and spatial perception. This review provides a comprehensive analysis of state-of-the-art methodologies, applications and challenges at the intersection of LLMs and 3D vision, with a focus on next-generation robotic sensing technologies. We first introduce the foundational principles of LLMs and 3D data representations, followed by an in-depth examination of 3D sensing technologies critical for robotics. The review then explores key advancements in scene understanding, text-to-3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
