Transforming Monolithic Foundation Models into Embodied Multi-Agent Architectures for Human-Robot Collaboration
Nan Sun, Bo Mao, Yongchang Li, Chenxu Wang, Di Guo, Huaping Liu

TL;DR
This paper introduces InteractGen, a multi-agent framework powered by large language models that decomposes robot intelligence into specialized agents, enhancing human-robot collaboration and adaptability in service robots.
Contribution
The paper proposes a novel multi-agent architecture that integrates foundation models as regulated components, enabling scalable, adaptable, and socially grounded robot autonomy.
Findings
Improved task success rates in real-world deployment
Enhanced adaptability and human-robot collaboration
Demonstrated effectiveness over monolithic models
Abstract
Foundation models have become central to unifying perception and planning in robotics, yet real-world deployment exposes a mismatch between their monolithic assumption that a single model can handle all cognitive functions and the distributed, dynamic nature of practical service workflows. Vision-language models offer strong semantic understanding but lack embodiment-aware action capabilities while relying on hand-crafted skills. Vision-Language-Action policies enable reactive manipulation but remain brittle across embodiments, weak in geometric grounding, and devoid of proactive collaboration mechanisms. These limitations indicate that scaling a single model alone cannot deliver reliable autonomy for service robots operating in human-populated settings. To address this gap, we present InteractGen, an LLM-powered multi-agent framework that decomposes robot intelligence into specialized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Human-Automation Interaction and Safety · Robot Manipulation and Learning
