AlphaSnake: Policy Iteration on a Nondeterministic NP-hard Markov Decision Process
Kevin Du, Ian Gemp, Yi Wu, Yingying Wu

TL;DR
This paper introduces AlphaSnake, an algorithm using Monte Carlo Tree Search inspired by AlphaZero, to learn optimal policies for the NP-hard Snake game, achieving a win rate over 50%.
Contribution
It demonstrates the application of policy iteration via MCTS to a complex NP-hard problem modeled as a stochastic MDP, specifically the Snake game.
Findings
Achieved a win rate over 0.5 in Snake.
First to demonstrate AlphaZero's effectiveness on NP-hard environments.
Surpassed previous algorithms in game performance.
Abstract
Reinforcement learning has recently been used to approach well-known NP-hard combinatorial problems in graph theory. Among these problems, Hamiltonian cycle problems are exceptionally difficult to analyze, even when restricted to individual instances of structurally complex graphs. In this paper, we use Monte Carlo Tree Search (MCTS), the search algorithm behind many state-of-the-art reinforcement learning algorithms such as AlphaZero, to create autonomous agents that learn to play the game of Snake, a game centered on properties of Hamiltonian cycles on grid graphs. The game of Snake can be formulated as a single-player discounted Markov Decision Process (MDP) where the agent must behave optimally in a stochastic environment. Determining the optimal policy for Snake, defined as the policy that maximizes the probability of winning - or win rate - with higher priority and minimizes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Digital Games and Media
MethodsAlphaZero
