Towards automating Codenames spymasters with deep reinforcement learning

Sherman Siu

arXiv:2212.14104·cs.CL·January 2, 2023

Towards automating Codenames spymasters with deep reinforcement learning

Sherman Siu

PDF

Open Access

TL;DR

This paper explores applying deep reinforcement learning to the cooperative word game Codenames, formulating it as a Markov Decision Process and testing algorithms like SAC, PPO, and A2C, highlighting challenges in convergence.

Contribution

It is the first work to model Codenames as an MDP and evaluate well-known RL algorithms on this complex, language-based cooperative game.

Findings

01

RL algorithms did not converge on Codenames environment

02

Algorithms only converged on simplified ClickPixel with small board size

03

Highlights challenges of applying RL to language-based cooperative games

Abstract

Although most reinforcement learning research has centered on competitive games, little work has been done on applying it to co-operative multiplayer games or text-based games. Codenames is a board game that involves both asymmetric co-operation and natural language processing, which makes it an excellent candidate for advancing RL research. To my knowledge, this work is the first to formulate Codenames as a Markov Decision Process and apply some well-known reinforcement learning algorithms such as SAC, PPO, and A2C to the environment. Although none of the above algorithms converge for the Codenames environment, neither do they converge for a simplified environment called ClickPixel, except when the board size is small.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiochemical and Structural Characterization

MethodsNone · Dilated Convolution · Convolution · Global Average Pooling · 1x1 Convolution · Average Pooling · Switchable Atrous Convolution · A2C · Entropy Regularization · Proximal Policy Optimization