Tackling Morpion Solitaire with AlphaZero-likeRanked Reward Reinforcement Learning
Hui Wang, Mike Preuss, Michael Emmerich, Aske Plaat

TL;DR
This paper applies AlphaZero-inspired deep reinforcement learning with ranked reward to Morpion Solitaire, achieving a new solution record close to the best human performance with less computational effort.
Contribution
It introduces a ranked reward based self-learning reinforcement learning framework for Morpion Solitaire, enabling near-record solutions without extensive domain-specific tuning.
Findings
Achieved a 67-step solution close to the human best of 68.
Demonstrated the effectiveness of ranked reward in sparse reward environments.
Provided avenues for further improvements in game-solving algorithms.
Abstract
Morpion Solitaire is a popular single player game, performed with paper and pencil. Due to its large state space (on the order of the game of Go) traditional search algorithms, such as MCTS, have not been able to find good solutions. A later algorithm, Nested Rollout Policy Adaptation, was able to find a new record of 82 steps, albeit with large computational resources. After achieving this record, to the best of our knowledge, there has been no further progress reported, for about a decade. In this paper we take the recent impressive performance of deep self-learning reinforcement learning approaches from AlphaGo/AlphaZero as inspiration to design a searcher for Morpion Solitaire. A challenge of Morpion Solitaire is that the state space is sparse, there are few win/loss signals. Instead, we use an approach known as ranked reward to create a reinforcement learning self-play framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
