Yahtzee: Reinforcement Learning Techniques for Stochastic Combinatorial Games
Nicholas A. Pape

TL;DR
This paper explores reinforcement learning methods for the game Yahtzee, formulating it as an MDP and evaluating policy gradient algorithms, with A2C showing robustness and near-optimal performance.
Contribution
The study applies and compares various policy gradient RL algorithms to Yahtzee, highlighting A2C's robustness and analyzing the challenges in learning optimal strategies.
Findings
A2C trains robustly across different settings.
The agent achieves within 5% of the optimal score.
Models struggle with learning the upper bonus strategy.
Abstract
Yahtzee is a classic dice game with a stochastic, combinatorial structure and delayed rewards, making it an interesting mid-scale RL benchmark. While an optimal policy for solitaire Yahtzee can be computed using dynamic programming methods, multiplayer is intractable, motivating approximation methods. We formulate Yahtzee as a Markov Decision Process (MDP), and train self-play agents using various policy gradient methods: REINFORCE, Advantage Actor-Critic (A2C), and Proximal Policy Optimization (PPO), all using a multi-headed network with a shared trunk. We ablate feature and action encodings, architecture, return estimators, and entropy regularization to understand their impact on learning. Under a fixed training budget, REINFORCE and PPO prove sensitive to hyperparameters and fail to reach near-optimal performance, whereas A2C trains robustly across a range of settings. Our agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Educational Games and Gamification
