An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative Tasks

George Papadopoulos; Andreas Kontogiannis; Foteini Papadopoulou; Chaido Poulianou; Ioannis Koumentis; George Vouros

arXiv:2502.04773·cs.LG·July 8, 2025

An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative Tasks

George Papadopoulos, Andreas Kontogiannis, Foteini Papadopoulou, Chaido Poulianou, Ioannis Koumentis, George Vouros

PDF

Open Access 1 Repo

TL;DR

This paper conducts a comprehensive evaluation of multi-agent reinforcement learning algorithms across diverse complex cooperative tasks, revealing limitations of current benchmarks and providing an extended evaluation framework with open-source tools.

Contribution

It introduces a systematic benchmarking approach for cooperative MARL algorithms across various complex tasks, including high-dimensional observations, and releases an extended evaluation library.

Findings

01

Many state-of-the-art algorithms underperform on complex benchmarks.

02

Performance varies significantly across different types of cooperative tasks.

03

The extended benchmark suite reveals limitations of existing MARL algorithms.

Abstract

Multi-Agent Reinforcement Learning (MARL) has recently emerged as a significant area of research. However, MARL evaluation often lacks systematic diversity, hindering a comprehensive understanding of algorithms' capabilities. In particular, cooperative MARL algorithms are predominantly evaluated on benchmarks such as SMAC and GRF, which primarily feature team game scenarios without assessing adequately various aspects of agents' capabilities required in fully cooperative real-world tasks such as multi-robot cooperation and warehouse, resource management, search and rescue, and human-AI cooperation. Moreover, MARL algorithms are mainly evaluated on low dimensional state spaces, and thus their performance on high-dimensional (e.g., image) observations is not well-studied. To fill this gap, this paper highlights the crucial need for expanding systematic evaluation across a wider array of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ailabdsunipi/pymarlzooplus
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics