
TL;DR
The paper introduces Athénan, a Minimax-based reinforcement learning approach that outperforms AlphaZero-like methods in efficiency and training data generation for various games.
Contribution
It presents a novel Minimax-based search algorithm called Descent and demonstrates its superior efficiency over AlphaZero-like algorithms in multiple games.
Findings
Athénan is more efficient than Polygames in training and data generation.
Athénan achieves competitive performance with significantly less computational resources.
Training data generation cost is approximately 296 times lower with Athénan.
Abstract
Deep Reinforcement Learning reaches a superhuman level of play in many complete information games. The state of the art algorithm for learning with zero knowledge is AlphaZero. We take another approach, Ath\'enan, which uses a different, Minimax-based, search algorithm called Descent, as well as different learning targets and that does not use a policy. We show that for multiple games it is much more efficient than the reimplementation of AlphaZero: Polygames. It is even competitive with Polygames when Polygames uses 100 times more GPU (at least for some games). One of the keys to the superior performance is that the cost of generating state data for training is approximately 296 times lower with Ath\'enan. With the same reasonable ressources, Ath\'enan without reinforcement heuristic is at least 7 times faster than Polygames and much more than 30 times faster with reinforcement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
