Reinforcement Learning For Constraint Satisfaction Game Agents (15-Puzzle, Minesweeper, 2048, and Sudoku)
Anav Mehta

TL;DR
This paper applies deep Q-learning to four constraint satisfaction games, demonstrating how reward design and state representation influence the agent's ability to learn effective control policies in complex, partially observable environments.
Contribution
It introduces a novel approach to reward structuring and neural network formulation for reinforcement learning in constraint satisfaction games, with empirical results across multiple challenging puzzles.
Findings
100% win rate for low-shuffle 15-Puzzle
High success rates in 2048 variants
Variable success in Minesweeper and Sudoku
Abstract
In recent years, reinforcement learning has seen interest because of deep Q-Learning, where the model is a convolutional neural network. Deep Q-Learning has shown promising results in games such as Atari and AlphaGo. Instead of learning the entire Q-table, it learns an estimate of the Q function that determines a state's policy action. We use Q-Learning and deep Q-learning, to learn control policies of four constraint satisfaction games (15-Puzzle, Minesweeper, 2048, and Sudoku). 15-Puzzle is a sliding permutation puzzle and provides a challenge in addressing its large state space. Minesweeper and Sudoku involve partially observable states and guessing. 2048 is also a sliding puzzle but allows for easier state representation (compared to 15-Puzzle) and uses interesting reward shaping to solve the game. These games offer unique insights into the potential and limits of reinforcement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Evolutionary Algorithms and Applications
