Reinforcement Learning for Hanabi
Nina Cohen, Kordel K. France

TL;DR
This paper evaluates various reinforcement learning algorithms in Hanabi, a cooperative card game with incomplete information, finding that temporal difference methods like Expected SARSA and Deep Q-Learning perform best.
Contribution
The study compares tabular and deep RL algorithms in Hanabi, identifying which algorithms excel in different competitive scenarios and analyzing agent interactions.
Findings
TD algorithms outperform tabular agents in overall performance
Expected SARSA and Deep Q-Learning agents achieve the highest scores
Certain agents excel against specific opponents, showing adaptive strategies
Abstract
Hanabi has become a popular game for research when it comes to reinforcement learning (RL) as it is one of the few cooperative card games where you have incomplete knowledge of the entire environment, thus presenting a challenge for a RL agent. We explored different tabular and deep reinforcement learning algorithms to see which had the best performance both against an agent of the same type and also against other types of agents. We establish that certain agents played their highest scoring games against specific agents while others exhibited higher scores on average by adapting to the opposing agent's behavior. We attempted to quantify the conditions under which each algorithm provides the best advantage and identified the most interesting interactions between agents of different types. In the end, we found that temporal difference (TD) algorithms had better overall performance and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElevator Systems and Control
MethodsExpected Sarsa · Sarsa · Q-Learning
