Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi
Hadi Nekoei, Xutong Zhao, Janarthanan Rajendran, Miao Liu, Sarath, Chandar

TL;DR
This paper evaluates the adaptability of zero-shot coordination algorithms in multi-agent reinforcement learning within the Hanabi game, revealing that naive methods can match state-of-the-art algorithms in adaptation speed and highlighting the importance of training hyper-parameters.
Contribution
It introduces a new framework and metric for assessing agent adaptability in Hanabi, and provides empirical insights into how hyper-parameters influence adaptability of MARL algorithms.
Findings
Naive IQL agents adapt as quickly as SOTA OBL in most cases.
Hyper-parameters controlling data diversity and optimization significantly affect adaptability.
Current ZSC algorithms require extensive interaction samples to adapt to new partners.
Abstract
Cooperative Multi-agent Reinforcement Learning (MARL) algorithms with Zero-Shot Coordination (ZSC) have gained significant attention in recent years. ZSC refers to the ability of agents to coordinate zero-shot (without additional interaction experience) with independently trained agents. While ZSC is crucial for cooperative MARL agents, it might not be possible for complex tasks and changing environments. Agents also need to adapt and improve their performance with minimal interaction with other agents. In this work, we show empirically that state-of-the-art ZSC algorithms have poor performance when paired with agents trained with different learning methods, and they require millions of interaction samples to adapt to these new partners. To investigate this issue, we formally defined a framework based on a popular cooperative multi-agent game called Hanabi to evaluate the adaptability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques
MethodsQ-Learning
