Reinforcement Learning for Hanabi

Nina Cohen; Kordel K. France

arXiv:2506.00458·cs.LG·June 3, 2025

Reinforcement Learning for Hanabi

Nina Cohen, Kordel K. France

PDF

Open Access

TL;DR

This paper evaluates various reinforcement learning algorithms in Hanabi, a cooperative card game with incomplete information, finding that temporal difference methods like Expected SARSA and Deep Q-Learning perform best.

Contribution

The study compares tabular and deep RL algorithms in Hanabi, identifying which algorithms excel in different competitive scenarios and analyzing agent interactions.

Findings

01

TD algorithms outperform tabular agents in overall performance

02

Expected SARSA and Deep Q-Learning agents achieve the highest scores

03

Certain agents excel against specific opponents, showing adaptive strategies

Abstract

Hanabi has become a popular game for research when it comes to reinforcement learning (RL) as it is one of the few cooperative card games where you have incomplete knowledge of the entire environment, thus presenting a challenge for a RL agent. We explored different tabular and deep reinforcement learning algorithms to see which had the best performance both against an agent of the same type and also against other types of agents. We establish that certain agents played their highest scoring games against specific agents while others exhibited higher scores on average by adapting to the opposing agent's behavior. We attempted to quantify the conditions under which each algorithm provides the best advantage and identified the most interesting interactions between agents of different types. In the end, we found that temporal difference (TD) algorithms had better overall performance and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElevator Systems and Control

MethodsExpected Sarsa · Sarsa · Q-Learning