TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft
Qian Long, Zhi Li, Ran Gong, Ying Nian Wu, Demetri Terzopoulos,, Xiaofeng Gao

TL;DR
TeamCraft introduces a comprehensive multi-modal multi-agent benchmark in Minecraft to evaluate and improve generalization of collaborative agents across diverse tasks and environments.
Contribution
The paper presents a new benchmark, TeamCraft, with extensive tasks, demonstrations, and evaluation protocols for multi-modal multi-agent collaboration in Minecraft.
Findings
Existing models struggle with generalizing to new goals and scenes.
Significant challenges remain in multi-agent collaboration in complex environments.
The benchmark reveals key limitations of current approaches.
Abstract
Collaboration is a cornerstone of society. In the real world, human teammates make use of multi-sensory data to tackle challenging tasks in ever-changing environments. It is essential for embodied agents collaborating in visually-rich environments replete with dynamic interactions to understand multi-modal observations and task specifications. To evaluate the performance of generalizable multi-modal collaborative agents, we present TeamCraft, a multi-modal multi-agent benchmark built on top of the open-world video game Minecraft. The benchmark features 55,000 task variants specified by multi-modal prompts, procedurally-generated expert demonstrations for imitation learning, and carefully designed protocols to evaluate model generalization capabilities. We also perform extensive analyses to better understand the limitations and strengths of existing approaches. Our results indicate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsService-Oriented Architecture and Web Services · Simulation Techniques and Applications · Mobile Agent-Based Network Management
