Towards Embodied Agentic AI: Review and Classification of LLM- and VLM-Driven Robot Autonomy and Interaction

Sahar Salimpour; Lei Fu; Kajetan Rachwa{\l}; Pascal Bertrand; Kevin O'Sullivan; Robert Jakob; Farhad Keramat; Leonardo Militano; Giovanni Toffetti; Harry Edelman; Jorge Pe\~na Queralta

arXiv:2508.05294·cs.RO·November 14, 2025

Towards Embodied Agentic AI: Review and Classification of LLM- and VLM-Driven Robot Autonomy and Interaction

Sahar Salimpour, Lei Fu, Kajetan Rachwa{\l}, Pascal Bertrand, Kevin O'Sullivan, Robert Jakob, Farhad Keramat, Leonardo Militano, Giovanni Toffetti, Harry Edelman, Jorge Pe\~na Queralta

PDF

TL;DR

This paper reviews recent advances in robot autonomy driven by foundation models like LLMs and VLMs, highlighting architectures that enable reasoning, planning, and interaction in robotic systems.

Contribution

It provides a comprehensive taxonomy and comparative analysis of agentic architectures integrating foundation models for robot autonomy and interaction.

Findings

01

Agentic architectures enable reasoning over natural language instructions.

02

Integration of APIs and planning enhances robot capabilities.

03

Community projects and frameworks are shaping emerging trends.

Abstract

Foundation models, including large language models (LLMs) and vision-language models (VLMs), have recently enabled novel approaches to robot autonomy and human-robot interfaces. In parallel, vision-language-action models (VLAs) or large behavior models (LBMs) are increasing the dexterity and capabilities of robotic systems. This survey paper reviews works that advance agentic applications and architectures, including initial efforts with GPT-style interfaces and more complex systems where AI agents function as coordinators, planners, perception actors, or generalist interfaces. Such agentic architectures allow robots to reason over natural language instructions, invoke APIs, plan task sequences, or assist in operations and diagnostics. In addition to peer-reviewed research, due to the fast-evolving nature of the field, we highlight and include community-driven projects, ROS packages,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.