Dispelling the Mirage of Progress in Offline MARL through Standardised   Baselines and Evaluation

Claude Formanek; Callum Rhys Tilbury; Louise Beyers; Jonathan Shock,; Arnu Pretorius

arXiv:2406.09068·cs.LG·October 31, 2024

Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation

Claude Formanek, Callum Rhys Tilbury, Louise Beyers, Jonathan Shock,, Arnu Pretorius

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper highlights inconsistencies in offline MARL evaluation, demonstrates simple baselines often outperform complex methods, and proposes standardized evaluation protocols to improve scientific rigor.

Contribution

It introduces a standardized evaluation methodology and baseline implementations that outperform many existing algorithms in offline MARL.

Findings

01

Simple baselines match or surpass SOTA in 75% of datasets.

02

Current evaluation practices are inconsistent and unreliable.

03

Standardized protocols improve reproducibility and comparison.

Abstract

Offline multi-agent reinforcement learning (MARL) is an emerging field with great promise for real-world applications. Unfortunately, the current state of research in offline MARL is plagued by inconsistencies in baselines and evaluation protocols, which ultimately makes it difficult to accurately assess progress, trust newly proposed innovations, and allow researchers to easily build upon prior work. In this paper, we firstly identify significant shortcomings in existing methodologies for measuring the performance of novel algorithms through a representative study of published offline MARL work. Secondly, by directly comparing to this prior work, we demonstrate that simple, well-implemented baselines can achieve state-of-the-art (SOTA) results across a wide range of tasks. Specifically, we show that on 35 out of 47 datasets used in prior work (almost 75% of cases), we match or surpass…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

instadeepai/og-marl
tfOfficial

Videos

Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation· slideslive

Taxonomy

TopicsSystems Engineering Methodologies and Applications · Safety Systems Engineering in Autonomy