Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review
Matthew Lisondra, Beno Benhabib, Goldie Nejat

TL;DR
This systematic review explores how foundation models are integrated into mobile service robots to enhance perception, reasoning, and action, addressing key challenges and applications in real-world human environments.
Contribution
It provides a comprehensive analysis of recent advances in foundation models for embodied AI in mobile robots, highlighting technical challenges, applications, and future research directions.
Findings
Foundation models improve multimodal perception and control in robots.
Recent advances enable context-aware and socially responsive robot behaviors.
Discussion of ethical and societal implications of deploying such robots.
Abstract
Rapid advancements in foundation models, including Large Language Models, Vision-Language Models, Multimodal Large Language Models, and Vision-Language-Action Models, have opened new avenues for embodied AI in mobile service robotics. By combining foundation models with the principles of embodied AI, where intelligent systems perceive, reason, and act through physical interaction, mobile service robots can achieve more flexible understanding, adaptive behavior, and robust task execution in dynamic real-world environments. Despite this progress, embodied AI for mobile service robots continues to face fundamental challenges related to the translation of natural language instructions into executable robot actions, multimodal perception in human-centered environments, uncertainty estimation for safe decision-making, and computational constraints for real-time onboard deployment. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james
