Impartial Games: A Challenge for Reinforcement Learning
Bei Zhou, S{\o}ren Riis

TL;DR
This paper demonstrates that AlphaZero-style reinforcement learning algorithms face fundamental challenges in mastering impartial games like Nim, due to their inability to learn abstract mathematical principles such as parity, especially as game complexity increases.
Contribution
The paper introduces a new framework to evaluate RL agents in impartial games and reveals inherent representational limitations of neural networks in learning abstract functions like parity.
Findings
AlphaZero-style agents succeed on small Nim boards but struggle as size increases.
Neural networks have difficulty learning non-associative functions like parity.
Current RL algorithms cannot effectively master impartial games beyond rote memorization.
Abstract
AlphaZero-style reinforcement learning (RL) algorithms have achieved superhuman performance in many complex board games such as Chess, Shogi, and Go. However, we showcase that these algorithms encounter significant and fundamental challenges when applied to impartial games, a class where players share game pieces and optimal strategy often relies on abstract mathematical principles. Specifically, we utilise the game of Nim as a concrete and illustrative case study to reveal critical limitations of AlphaZero-style and similar self-play RL algorithms. We introduce a novel conceptual framework distinguishing between champion and expert mastery to evaluate RL agent performance. Our findings reveal that while AlphaZero-style agents can achieve champion-level play on very small Nim boards, their learning progression severely degrades as the board size increases. This difficulty stems not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Sports Analytics and Performance · Explainable Artificial Intelligence (XAI)
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Batch Normalization · Residual Block · Prioritized Experience Replay · Convolution · Average Pooling · Monte-Carlo Tree Search · MuZero · AlphaZero
