Foundation Model Driven Robotics: A Comprehensive Review
Muhammad Tayyab Khan, Ammar Waheed

TL;DR
This comprehensive review explores how foundation models like LLMs and VLMs are revolutionizing robotics by enhancing perception, reasoning, and interaction, while also discussing current challenges and future directions.
Contribution
It provides a structured synthesis of recent developments in foundation model-driven robotics, emphasizing integrated system strategies and practical feasibility in real-world applications.
Findings
Highlighting the role of procedural scene generation and multimodal reasoning.
Identifying key bottlenecks such as embodiment limitations and data scarcity.
Discussing future challenges like real-time operation and model interpretability.
Abstract
The rapid emergence of foundation models, particularly Large Language Models (LLMs) and Vision-Language Models (VLMs), has introduced a transformative paradigm in robotics. These models offer powerful capabilities in semantic understanding, high-level reasoning, and cross-modal generalization, enabling significant advances in perception, planning, control, and human-robot interaction. This critical review provides a structured synthesis of recent developments, categorizing applications across simulation-driven design, open-world execution, sim-to-real transfer, and adaptable robotics. Unlike existing surveys that emphasize isolated capabilities, this work highlights integrated, system-level strategies and evaluates their practical feasibility in real-world environments. Key enabling trends such as procedural scene generation, policy generalization, and multimodal reasoning are discussed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques
