Towards automating Codenames spymasters with deep reinforcement learning
Sherman Siu

TL;DR
This paper explores applying deep reinforcement learning to the cooperative word game Codenames, formulating it as a Markov Decision Process and testing algorithms like SAC, PPO, and A2C, highlighting challenges in convergence.
Contribution
It is the first work to model Codenames as an MDP and evaluate well-known RL algorithms on this complex, language-based cooperative game.
Findings
RL algorithms did not converge on Codenames environment
Algorithms only converged on simplified ClickPixel with small board size
Highlights challenges of applying RL to language-based cooperative games
Abstract
Although most reinforcement learning research has centered on competitive games, little work has been done on applying it to co-operative multiplayer games or text-based games. Codenames is a board game that involves both asymmetric co-operation and natural language processing, which makes it an excellent candidate for advancing RL research. To my knowledge, this work is the first to formulate Codenames as a Markov Decision Process and apply some well-known reinforcement learning algorithms such as SAC, PPO, and A2C to the environment. Although none of the above algorithms converge for the Codenames environment, neither do they converge for a simplified environment called ClickPixel, except when the board size is small.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiochemical and Structural Characterization
MethodsNone · Dilated Convolution · Convolution · Global Average Pooling · 1x1 Convolution · Average Pooling · Switchable Atrous Convolution · A2C · Entropy Regularization · Proximal Policy Optimization
