Building Cooperative Embodied Agents Modularly with Large Language Models
Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua, B. Tenenbaum, Tianmin Shu, Chuang Gan

TL;DR
This paper introduces CoELA, a modular framework integrating large language models into embodied multi-agent systems for improved cooperation, communication, and planning in complex environments, surpassing traditional methods.
Contribution
The work presents a novel modular approach that leverages LLMs for embodied multi-agent cooperation, incorporating perception, memory, and execution for long-horizon tasks.
Findings
CoELA driven by GPT-4 outperforms planning-based methods.
Emergent effective communication among agents.
Natural language communication increases human trust and cooperation.
Abstract
In this work, we address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments. While previous research either presupposes a cost-free communication channel or relies on a centralized controller with shared observations, we harness the commonsense knowledge, reasoning ability, language comprehension, and text generation prowess of LLMs and seamlessly incorporate them into a cognitive-inspired modular framework that integrates with perception, memory, and execution. Thus building a Cooperative Embodied Language Agent CoELA, who can plan, communicate, and cooperate with others to accomplish long-horizon tasks efficiently. Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong planning-based methods and exhibit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Residual Connection
