"Other-Play" for Zero-Shot Coordination
Hengyuan Hu, Adam Lerer, Alex Peysakhovich, Jakob Foerster

TL;DR
This paper introduces other-play, a novel learning algorithm that improves zero-shot coordination by leveraging symmetries, enabling AI agents to better coordinate with unseen partners including humans.
Contribution
The paper proposes other-play, a new algorithm that enhances self-play by exploiting problem symmetries to improve zero-shot coordination in multi-agent settings.
Findings
OP agents outperform self-play agents in Hanabi.
OP agents achieve higher scores with human players.
Theoretical characterization of OP demonstrates its robustness.
Abstract
We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e.g. humans). Standard Multi-Agent Reinforcement Learning (MARL) methods typically focus on the self-play (SP) setting where agents construct strategies by playing the game with themselves repeatedly. Unfortunately, applying SP naively to the zero-shot coordination problem can produce agents that establish highly specialized conventions that do not carry over to novel partners they have not been trained with. We introduce a novel learning algorithm called other-play (OP), that enhances self-play by looking for more robust strategies, exploiting the presence of known symmetries in the underlying problem. We characterize OP theoretically as well as experimentally. We study the cooperative card game Hanabi and show that OP agents achieve higher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Sports Analytics and Performance
