Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In   the Game of Hanabi

Hadi Nekoei; Xutong Zhao; Janarthanan Rajendran; Miao Liu; Sarath; Chandar

arXiv:2308.10284·cs.LG·August 22, 2023

Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi

Hadi Nekoei, Xutong Zhao, Janarthanan Rajendran, Miao Liu, Sarath, Chandar

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the adaptability of zero-shot coordination algorithms in multi-agent reinforcement learning within the Hanabi game, revealing that naive methods can match state-of-the-art algorithms in adaptation speed and highlighting the importance of training hyper-parameters.

Contribution

It introduces a new framework and metric for assessing agent adaptability in Hanabi, and provides empirical insights into how hyper-parameters influence adaptability of MARL algorithms.

Findings

01

Naive IQL agents adapt as quickly as SOTA OBL in most cases.

02

Hyper-parameters controlling data diversity and optimization significantly affect adaptability.

03

Current ZSC algorithms require extensive interaction samples to adapt to new partners.

Abstract

Cooperative Multi-agent Reinforcement Learning (MARL) algorithms with Zero-Shot Coordination (ZSC) have gained significant attention in recent years. ZSC refers to the ability of agents to coordinate zero-shot (without additional interaction experience) with independently trained agents. While ZSC is crucial for cooperative MARL agents, it might not be possible for complex tasks and changing environments. Agents also need to adapt and improve their performance with minimal interaction with other agents. In this work, we show empirically that state-of-the-art ZSC algorithms have poor performance when paired with agents trained with different learning methods, and they require millions of interaction samples to adapt to these new partners. To investigate this issue, we formally defined a framework based on a popular cooperative multi-agent game called Hanabi to evaluate the adaptability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chandar-lab/adaptive-hanabi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques

MethodsQ-Learning