Towards Embodied Agentic AI: Review and Classification of LLM- and VLM-Driven Robot Autonomy and Interaction
Sahar Salimpour, Lei Fu, Kajetan Rachwa{\l}, Pascal Bertrand, Kevin O'Sullivan, Robert Jakob, Farhad Keramat, Leonardo Militano, Giovanni Toffetti, Harry Edelman, Jorge Pe\~na Queralta

TL;DR
This paper reviews recent advances in robot autonomy driven by foundation models like LLMs and VLMs, highlighting architectures that enable reasoning, planning, and interaction in robotic systems.
Contribution
It provides a comprehensive taxonomy and comparative analysis of agentic architectures integrating foundation models for robot autonomy and interaction.
Findings
Agentic architectures enable reasoning over natural language instructions.
Integration of APIs and planning enhances robot capabilities.
Community projects and frameworks are shaping emerging trends.
Abstract
Foundation models, including large language models (LLMs) and vision-language models (VLMs), have recently enabled novel approaches to robot autonomy and human-robot interfaces. In parallel, vision-language-action models (VLAs) or large behavior models (LBMs) are increasing the dexterity and capabilities of robotic systems. This survey paper reviews works that advance agentic applications and architectures, including initial efforts with GPT-style interfaces and more complex systems where AI agents function as coordinators, planners, perception actors, or generalist interfaces. Such agentic architectures allow robots to reason over natural language instructions, invoke APIs, plan task sequences, or assist in operations and diagnostics. In addition to peer-reviewed research, due to the fast-evolving nature of the field, we highlight and include community-driven projects, ROS packages,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
