Why Global Performance is a Poor Metric for Verifying Convergence of Multi-agent Learning
Sherief Abdallah

TL;DR
This paper highlights the limitations of using global performance metrics to verify stability in multi-agent reinforcement learning, demonstrating that local policy-based metrics can better reveal underlying instabilities.
Contribution
It introduces a new local policy-based metric for assessing stability in multi-agent systems, showing its effectiveness over traditional global metrics through experimental validation.
Findings
Global metrics can hide underlying instabilities.
Local policy-based metrics better detect instability.
Proposed metric exposes issues missed by traditional methods.
Abstract
Experimental verification has been the method of choice for verifying the stability of a multi-agent reinforcement learning (MARL) algorithm as the number of agents grows and theoretical analysis becomes prohibitively complex. For cooperative agents, where the ultimate goal is to optimize some global metric, the stability is usually verified by observing the evolution of the global performance metric over time. If the global metric improves and eventually stabilizes, it is considered a reasonable verification of the system's stability. The main contribution of this note is establishing the need for better experimental frameworks and measures to assess the stability of large-scale adaptive cooperative systems. We show an experimental case study where the stability of the global performance metric can be rather deceiving, hiding an underlying instability in the system that later leads…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Game Theory and Applications
