Simultaneous AlphaZero: Extending Tree Search to Markov Games

Tyler Becker; Zachary Sunberg

arXiv:2512.12486·cs.GT·December 16, 2025

Simultaneous AlphaZero: Extending Tree Search to Markov Games

Tyler Becker, Zachary Sunberg

PDF

Open Access

TL;DR

Simultaneous AlphaZero extends the original framework to handle Markov games with simultaneous moves, using matrix games and regret-based solvers, demonstrating robust strategies in complex pursuit-evasion and satellite scenarios.

Contribution

It introduces a novel extension of AlphaZero for Markov games with simultaneous actions, incorporating matrix game solvers with bandit feedback handling.

Findings

01

Robust strategies in pursuit-evasion game

02

Effective in satellite custody scenarios

03

Handles uncertainty with regret-optimal solvers

Abstract

Simultaneous AlphaZero extends the AlphaZero framework to multistep, two-player zero-sum deterministic Markov games with simultaneous actions. At each decision point, joint action selection is resolved via matrix games whose payoffs incorporate both immediate rewards and future value estimates. To handle uncertainty arising from bandit feedback during Monte Carlo Tree Search (MCTS), Simultaneous AlphaZero incorporates a regret-optimal solver for matrix games with bandit feedback. Simultaneous AlphaZero demonstrates robust strategies in a continuous-state discrete-action pursuit-evasion game and satellite custody maintenance scenarios, even when evaluated against maximally exploitative opponents.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Artificial Intelligence in Games · Reinforcement Learning in Robotics