YOLO-MARL: You Only LLM Once for Multi-Agent Reinforcement Learning
Yuan Zhuang, Yi Shen, Zhili Zhang, Yuxiao Chen, Fei Miao

TL;DR
YOLO-MARL introduces a framework that uses large language models once for high-level planning in multi-agent reinforcement learning, reducing costs while improving cooperative strategy learning.
Contribution
It proposes a novel approach that leverages LLMs only once per environment for planning, enhancing efficiency and effectiveness in MARL.
Findings
YOLO-MARL outperforms traditional MARL algorithms in tested environments.
The approach reduces inference costs by limiting LLM interactions to a single use per environment.
Decentralized policies trained with YOLO-MARL operate independently of LLMs.
Abstract
Advancements in deep multi-agent reinforcement learning (MARL) have positioned it as a promising approach for decision-making in cooperative games. However, it still remains challenging for MARL agents to learn cooperative strategies for some game environments. Recently, large language models (LLMs) have demonstrated emergent reasoning capabilities, making them promising candidates for enhancing coordination among the agents. However, due to the model size of LLMs, it can be expensive to frequently infer LLMs for actions that agents can take. In this work, we propose You Only LLM Once for MARL (YOLO-MARL), a novel framework that leverages the high-level task planning capabilities of LLMs to improve the policy learning process of multi-agents in cooperative games. Notably, for each game environment, YOLO-MARL only requires one time interaction with LLMs in the proposed strategy…
Peer Reviews
Decision·Submitted to ICLR 2025
I believe that using LLMs to enhance collaboration in MARL is a highly promising direction.
- The methodology throughout the paper doesn’t feel novel to me; I’ve seen several similar studies, and the quality of this paper doesn’t seem satisfactory. - None of the result figures appear to be processed; it looks as though they were directly downloaded from WANDB. Some figures show results from only one run, lacking statistical significance. I believe at least five runs should be conducted for each result. - Certain detailed introductions seem unnecessary, such as STATE INTERPRETATION, whi
(1) The use of LLMs in MARL to reduce the frequency of model calls is relatively novel. The one-time interaction strategy for each environment, minimizing computational overhead while leveraging LLM's capability, is an innovative approach. Addressing the computational inefficiency in MARL through LLMs is of high relevance, especially given the increasing complexity and scale of multi-agent systems. The potential application across different domains enhances the paper's impact. (2) The experimen
(1) **The necessity of using MARL settings as opposed to single-agent RL isn't convincingly justified. The claim of innovation specifically in the MARL domain seems overstretched without substantial differentiation from potential single-agent applications.** (2) The paper fails to discuss several relevant studies that also integrate LLMs with multi-agent systems, which could question the novelty and depth of the literature review. Notably, it omits significant recent works, which could provide
- The proposed idea is new, interesting, and well-motivated. - The paper is easy to read and follow. - Ample supplementary material containing examples and code helps in better understanding the proposed approach.
- Some more experiments and analysis/explanations may be required. - See questions.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation
