Simultaneous AlphaZero: Extending Tree Search to Markov Games
Tyler Becker, Zachary Sunberg

TL;DR
Simultaneous AlphaZero extends the original framework to handle Markov games with simultaneous moves, using matrix games and regret-based solvers, demonstrating robust strategies in complex pursuit-evasion and satellite scenarios.
Contribution
It introduces a novel extension of AlphaZero for Markov games with simultaneous actions, incorporating matrix game solvers with bandit feedback handling.
Findings
Robust strategies in pursuit-evasion game
Effective in satellite custody scenarios
Handles uncertainty with regret-optimal solvers
Abstract
Simultaneous AlphaZero extends the AlphaZero framework to multistep, two-player zero-sum deterministic Markov games with simultaneous actions. At each decision point, joint action selection is resolved via matrix games whose payoffs incorporate both immediate rewards and future value estimates. To handle uncertainty arising from bandit feedback during Monte Carlo Tree Search (MCTS), Simultaneous AlphaZero incorporates a regret-optimal solver for matrix games with bandit feedback. Simultaneous AlphaZero demonstrates robust strategies in a continuous-state discrete-action pursuit-evasion game and satellite custody maintenance scenarios, even when evaluated against maximally exploitative opponents.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Artificial Intelligence in Games · Reinforcement Learning in Robotics
