Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot
Joel Z. Leibo, Edgar Du\'e\~nez-Guzm\'an, Alexander Sasha Vezhnevets,, John P. Agapiou, Peter Sunehag, Raphael Koster, Jayd Matyas, Charles Beattie,, Igor Mordatch, Thore Graepel

TL;DR
Melting Pot is a scalable evaluation suite for multi-agent reinforcement learning that assesses generalization to new situations using RL-generated test scenarios, revealing weaknesses in algorithms beyond training performance.
Contribution
It introduces Melting Pot, a novel MARL evaluation framework that automates the creation of diverse test scenarios to evaluate generalization capabilities.
Findings
Over 80 diverse test scenarios created.
Reveals weaknesses in MARL algorithms not seen during training.
Demonstrates the importance of evaluation beyond training metrics.
Abstract
Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised-learning benchmarks). Our contribution, Melting Pot, is a MARL evaluation suite that fills this gap, and uses reinforcement learning to reduce the human labor required to create novel test scenarios. This works because one agent's behavior constitutes (part of) another agent's environment. To demonstrate scalability, we have created over 80 unique test scenarios covering a broad range of research topics such as social dilemmas, reciprocity, resource sharing, and task partitioning. We apply these test scenarios to standard MARL training algorithms, and demonstrate how Melting Pot reveals weaknesses not apparent from training performance alone.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
