Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation
Claude Formanek, Callum Rhys Tilbury, Louise Beyers, Jonathan Shock,, Arnu Pretorius

TL;DR
This paper highlights inconsistencies in offline MARL evaluation, demonstrates simple baselines often outperform complex methods, and proposes standardized evaluation protocols to improve scientific rigor.
Contribution
It introduces a standardized evaluation methodology and baseline implementations that outperform many existing algorithms in offline MARL.
Findings
Simple baselines match or surpass SOTA in 75% of datasets.
Current evaluation practices are inconsistent and unreliable.
Standardized protocols improve reproducibility and comparison.
Abstract
Offline multi-agent reinforcement learning (MARL) is an emerging field with great promise for real-world applications. Unfortunately, the current state of research in offline MARL is plagued by inconsistencies in baselines and evaluation protocols, which ultimately makes it difficult to accurately assess progress, trust newly proposed innovations, and allow researchers to easily build upon prior work. In this paper, we firstly identify significant shortcomings in existing methodologies for measuring the performance of novel algorithms through a representative study of published offline MARL work. Secondly, by directly comparing to this prior work, we demonstrate that simple, well-implemented baselines can achieve state-of-the-art (SOTA) results across a wide range of tasks. Specifically, we show that on 35 out of 47 datasets used in prior work (almost 75% of cases), we match or surpass…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSystems Engineering Methodologies and Applications · Safety Systems Engineering in Autonomy
