MindAgent: Emergent Gaming Interaction
Ran Gong, Qiuyuan Huang, Xiaojian Ma, Hoi Vo, Zane Durante, Yusuke, Noda, Zilong Zheng, Song-Chun Zhu, Demetri Terzopoulos, Li Fei-Fei, Jianfeng, Gao

TL;DR
MindAgent introduces a new infrastructure and benchmark for evaluating emergent planning and coordination capabilities of large language models in multi-agent gaming scenarios, including human-AI collaboration and real-world applications.
Contribution
The paper presents MindAgent, a novel infrastructure and CUISINEWORLD benchmark for assessing multi-agent collaboration and coordination in gaming, integrating LLMs with human interaction and real-world deployment.
Findings
Effective evaluation of multi-agent collaboration efficiency.
Demonstrated LLMs' ability to coordinate in gaming scenarios.
Potential for real-world gaming applications with VR adaptation.
Abstract
Large Language Models (LLMs) have the capacity of performing complex scheduling in a multi-agent system and can coordinate these agents into completing sophisticated tasks that require extensive collaboration. However, despite the introduction of numerous gaming frameworks, the community has insufficient benchmarks towards building general multi-agents collaboration infrastructure that encompass both LLM and human-NPCs collaborations. In this work, we propose a novel infrastructure - MindAgent - to evaluate planning and coordination emergent capabilities for gaming interaction. In particular, our infrastructure leverages existing gaming framework, to i) require understanding of the coordinator for a multi-agent system, ii) collaborate with human players via un-finetuned proper instructions, and iii) establish an in-context learning on few-shot prompt with feedback. Furthermore, we…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
The work is solid and important to the community which provides a benchmark that supports the implementation of a general multi-agent infrastructure that encompasses collaboration between large language models (LLMs) and human-NPCs. The paper is well-written and organized.
1. The font size of Figure 4 is too small. 2. The paper seems not provide a clear definition of the terms $q_{pim}$ and $c_{pim}$ in Equation 2.
- The creation and development of a new benchmark, entailing a substantial amount of effort, is a significant strength of the paper. - The introduction of a novel metric, CoS (Coordination Score), to assess coordination capabilities is a noteworthy contribution.
- The proposed LLM coordination system, MineAgent, lacks novelty as its approach of leveraging scratchpad or memory has been extensively explored in prior research. - The proposed environment features limited state and action spaces and furnishes all the requisite recipes to solve tasks, possibly oversimplifying the challenge. In this situation, heuristic or RL planners could be readily employed. However, the paper does not provide comparisons with these approaches. - The proposed environment by
Contributions: 1. Evaluate LLM agent in a multi-agent gaming environment; test LLM's ability to serve as a task allocator. 2. Proposes a collaboration score to evaluate and benchmark coordination agents.
1. Technical novelty: The technical novelty of this work is lacking because the multi-agent setting is really just adding an LLM call to allocate what different agents should do. Concretely, it is a simple prompting and what is supposed to be an optimization problem is all packed into one LLM call. This doesn't seem to be a very principled way of studying the agent allocation. 2. Problem setting: the multi-agent collaboration problem is reduced to a top-down allocation: a distributor distribut
***Originality & Significance*** Although this work is mainly application-oriented for LLMs, its novelty lies in proposing an infrastructure to explore the potential of LLMs in multi-agent planning and conducting experiments on multiple LLMs. I believe this work will inspire the LLM community and provide a valuable test bed for evaluating LLM capabilities. ***Quality & Clarity*** This work provides detailed descriptions and examples of the components of MindAgent. The paper also offers a good
Please refer to the questions section.
- The paper introduces a new text-based multi-agent cooperation benchmark, with a vivid visual appearance. - The paper conducts experiments involving more than 2 agents and human. - The paper shows some efforts in transferring the method in real-world gaming scenarios Mindcraft.
1. The Benchmark's Contribution Falls Short - The CuisineWorld, based on the video game Overcooked which is popular for measuring multi-agent cooperation and has many existing environments already[1][2]. The benchmark introduces more cuisines but falls short in diversifying kitchen maps when compared to previous Overcooked environments. - Table 12 made a good summary of related benchmarks though some related embodied multi-agent cooperation benchmarks with most of the features in the table are
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
