XDO: A Double Oracle Algorithm for Extensive-Form Games

Stephen McAleer; John Lanier; Kevin Wang; Pierre Baldi; Roy Fox

arXiv:2103.06426·cs.GT·February 1, 2022·5 cites

XDO: A Double Oracle Algorithm for Extensive-Form Games

Stephen McAleer, John Lanier, Kevin Wang, Pierre Baldi, Roy Fox

PDF

Open Access 1 Repo 1 Video

TL;DR

The paper introduces XDO, a double oracle algorithm for extensive-form games that converges linearly to approximate Nash equilibria, and Neural XDO, a deep RL variant capable of handling high-dimensional continuous actions.

Contribution

It proposes XDO, which improves convergence over PSRO by updating at every infostate, and Neural XDO, the first deep RL method for high-dimensional sequential games.

Findings

01

XDO converges faster than PSRO in large games.

02

Tabular XDO achieves lower exploitability than CFR.

03

NXDO outperforms PSRO and NFSP in continuous-action games.

Abstract

Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guaranteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

indylab/nxdo
pytorchOfficial

Videos

XDO: A Double Oracle Algorithm for Extensive-Form Games· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Sports Analytics and Performance