Foundation Model Driven Robotics: A Comprehensive Review

Muhammad Tayyab Khan; Ammar Waheed

arXiv:2507.10087·cs.RO·July 15, 2025

Foundation Model Driven Robotics: A Comprehensive Review

Muhammad Tayyab Khan, Ammar Waheed

PDF

Open Access

TL;DR

This comprehensive review explores how foundation models like LLMs and VLMs are revolutionizing robotics by enhancing perception, reasoning, and interaction, while also discussing current challenges and future directions.

Contribution

It provides a structured synthesis of recent developments in foundation model-driven robotics, emphasizing integrated system strategies and practical feasibility in real-world applications.

Findings

01

Highlighting the role of procedural scene generation and multimodal reasoning.

02

Identifying key bottlenecks such as embodiment limitations and data scarcity.

03

Discussing future challenges like real-time operation and model interpretability.

Abstract

The rapid emergence of foundation models, particularly Large Language Models (LLMs) and Vision-Language Models (VLMs), has introduced a transformative paradigm in robotics. These models offer powerful capabilities in semantic understanding, high-level reasoning, and cross-modal generalization, enabling significant advances in perception, planning, control, and human-robot interaction. This critical review provides a structured synthesis of recent developments, categorizing applications across simulation-driven design, open-world execution, sim-to-real transfer, and adaptable robotics. Unlike existing surveys that emphasize isolated capabilities, this work highlights integrated, system-level strategies and evaluates their practical feasibility in real-world environments. Key enabling trends such as procedural scene generation, policy generalization, and multimodal reasoning are discussed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques