XDO: A Double Oracle Algorithm for Extensive-Form Games
Stephen McAleer, John Lanier, Kevin Wang, Pierre Baldi, Roy Fox

TL;DR
The paper introduces XDO, a double oracle algorithm for extensive-form games that converges linearly to approximate Nash equilibria, and Neural XDO, a deep RL variant capable of handling high-dimensional continuous actions.
Contribution
It proposes XDO, which improves convergence over PSRO by updating at every infostate, and Neural XDO, the first deep RL method for high-dimensional sequential games.
Findings
XDO converges faster than PSRO in large games.
Tabular XDO achieves lower exploitability than CFR.
NXDO outperforms PSRO and NFSP in continuous-action games.
Abstract
Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guaranteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Sports Analytics and Performance
