ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot   Coordination

Xihuai Wang; Shao Zhang; Wenhao Zhang; Wentao Dong; Jingxiao Chen,; Ying Wen; Weinan Zhang

arXiv:2310.05208·cs.AI·September 27, 2024·1 cites

ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination

Xihuai Wang, Shao Zhang, Wenhao Zhang, Wentao Dong, Jingxiao Chen,, Ying Wen, Weinan Zhang

PDF

Open Access 2 Repos

TL;DR

ZSC-Eval is a comprehensive toolkit and benchmark designed to evaluate zero-shot coordination algorithms in multi-agent reinforcement learning, addressing distribution gaps and providing new metrics for better assessment.

Contribution

The paper introduces ZSC-Eval, the first dedicated evaluation toolkit and benchmark for ZSC algorithms, including new partner generation, selection, and performance measurement methods.

Findings

01

Benchmarking in Overcooked and Google Research Football environments.

02

Empirical analysis reveals strengths and weaknesses of current ZSC algorithms.

03

Human experiments validate the evaluation metrics' alignment with human judgment.

Abstract

Zero-shot coordination (ZSC) is a new cooperative multi-agent reinforcement learning (MARL) challenge that aims to train an ego agent to work with diverse, unseen partners during deployment. The significant difference between the deployment-time partners' distribution and the training partners' distribution determined by the training algorithm makes ZSC a unique out-of-distribution (OOD) generalization challenge. The potential distribution gap between evaluation and deployment-time partners leads to inadequate evaluation, which is exacerbated by the lack of appropriate evaluation metrics. In this paper, we present ZSC-Eval, the first evaluation toolkit and benchmark for ZSC algorithms. ZSC-Eval consists of: 1) Generation of evaluation partner candidates through behavior-preferring rewards to approximate deployment-time partners' distribution; 2) Selection of evaluation partners by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAction Observation and Synchronization