ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination
Xihuai Wang, Shao Zhang, Wenhao Zhang, Wentao Dong, Jingxiao Chen,, Ying Wen, Weinan Zhang

TL;DR
ZSC-Eval is a comprehensive toolkit and benchmark designed to evaluate zero-shot coordination algorithms in multi-agent reinforcement learning, addressing distribution gaps and providing new metrics for better assessment.
Contribution
The paper introduces ZSC-Eval, the first dedicated evaluation toolkit and benchmark for ZSC algorithms, including new partner generation, selection, and performance measurement methods.
Findings
Benchmarking in Overcooked and Google Research Football environments.
Empirical analysis reveals strengths and weaknesses of current ZSC algorithms.
Human experiments validate the evaluation metrics' alignment with human judgment.
Abstract
Zero-shot coordination (ZSC) is a new cooperative multi-agent reinforcement learning (MARL) challenge that aims to train an ego agent to work with diverse, unseen partners during deployment. The significant difference between the deployment-time partners' distribution and the training partners' distribution determined by the training algorithm makes ZSC a unique out-of-distribution (OOD) generalization challenge. The potential distribution gap between evaluation and deployment-time partners leads to inadequate evaluation, which is exacerbated by the lack of appropriate evaluation metrics. In this paper, we present ZSC-Eval, the first evaluation toolkit and benchmark for ZSC algorithms. ZSC-Eval consists of: 1) Generation of evaluation partner candidates through behavior-preferring rewards to approximate deployment-time partners' distribution; 2) Selection of evaluation partners by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAction Observation and Synchronization
