YOLO-MARL: You Only LLM Once for Multi-Agent Reinforcement Learning

Yuan Zhuang; Yi Shen; Zhili Zhang; Yuxiao Chen; Fei Miao

arXiv:2410.03997·cs.MA·June 19, 2025

YOLO-MARL: You Only LLM Once for Multi-Agent Reinforcement Learning

Yuan Zhuang, Yi Shen, Zhili Zhang, Yuxiao Chen, Fei Miao

PDF

Open Access 3 Reviews

TL;DR

YOLO-MARL introduces a framework that uses large language models once for high-level planning in multi-agent reinforcement learning, reducing costs while improving cooperative strategy learning.

Contribution

It proposes a novel approach that leverages LLMs only once per environment for planning, enhancing efficiency and effectiveness in MARL.

Findings

01

YOLO-MARL outperforms traditional MARL algorithms in tested environments.

02

The approach reduces inference costs by limiting LLM interactions to a single use per environment.

03

Decentralized policies trained with YOLO-MARL operate independently of LLMs.

Abstract

Advancements in deep multi-agent reinforcement learning (MARL) have positioned it as a promising approach for decision-making in cooperative games. However, it still remains challenging for MARL agents to learn cooperative strategies for some game environments. Recently, large language models (LLMs) have demonstrated emergent reasoning capabilities, making them promising candidates for enhancing coordination among the agents. However, due to the model size of LLMs, it can be expensive to frequently infer LLMs for actions that agents can take. In this work, we propose You Only LLM Once for MARL (YOLO-MARL), a novel framework that leverages the high-level task planning capabilities of LLMs to improve the policy learning process of multi-agents in cooperative games. Notably, for each game environment, YOLO-MARL only requires one time interaction with LLMs in the proposed strategy…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 4

Strengths

I believe that using LLMs to enhance collaboration in MARL is a highly promising direction.

Weaknesses

- The methodology throughout the paper doesn’t feel novel to me; I’ve seen several similar studies, and the quality of this paper doesn’t seem satisfactory. - None of the result figures appear to be processed; it looks as though they were directly downloaded from WANDB. Some figures show results from only one run, lacking statistical significance. I believe at least five runs should be conducted for each result. - Certain detailed introductions seem unnecessary, such as STATE INTERPRETATION, whi

Reviewer 02Rating 3Confidence 5

Strengths

(1) The use of LLMs in MARL to reduce the frequency of model calls is relatively novel. The one-time interaction strategy for each environment, minimizing computational overhead while leveraging LLM's capability, is an innovative approach. Addressing the computational inefficiency in MARL through LLMs is of high relevance, especially given the increasing complexity and scale of multi-agent systems. The potential application across different domains enhances the paper's impact. (2) The experimen

Weaknesses

(1) **The necessity of using MARL settings as opposed to single-agent RL isn't convincingly justified. The claim of innovation specifically in the MARL domain seems overstretched without substantial differentiation from potential single-agent applications.** (2) The paper fails to discuss several relevant studies that also integrate LLMs with multi-agent systems, which could question the novelty and depth of the literature review. Notably, it omits significant recent works, which could provide

Reviewer 03Rating 5Confidence 4

Strengths

- The proposed idea is new, interesting, and well-motivated. - The paper is easy to read and follow. - Ample supplementary material containing examples and code helps in better understanding the proposed approach.

Weaknesses

- Some more experiments and analysis/explanations may be required. - See questions.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation