Mastering the Game of Stratego with Model-Free Multiagent Reinforcement   Learning

Julien Perolat; Bart de Vylder; Daniel Hennes; Eugene Tarassov,; Florian Strub; Vincent de Boer; Paul Muller; Jerome T. Connor; Neil Burch,; Thomas Anthony; Stephen McAleer; Romuald Elie; Sarah H. Cen; Zhe Wang,; Audrunas Gruslys; Aleksandra Malysheva; Mina Khan; Sherjil Ozair; Finbarr; Timbers; Toby Pohlen; Tom Eccles; Mark Rowland; Marc Lanctot; Jean-Baptiste; Lespiau; Bilal Piot; Shayegan Omidshafiei; Edward Lockhart; Laurent Sifre,; Nathalie Beauguerlange; Remi Munos; David Silver; Satinder Singh; Demis; Hassabis; Karl Tuyls

arXiv:2206.15378·cs.AI·January 11, 2023·5 cites

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Julien Perolat, Bart de Vylder, Daniel Hennes, Eugene Tarassov,, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch,, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang,, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair

PDF

Open Access 1 Repo

TL;DR

DeepNash is a novel model-free deep reinforcement learning agent that masters the complex, imperfect information game Stratego through self-play, achieving top human-level performance and surpassing previous AI methods.

Contribution

The paper introduces DeepNash, a new reinforcement learning approach with the R-NaD algorithm that converges to an approximate Nash equilibrium in Stratego, a long-standing AI challenge.

Findings

01

DeepNash outperforms existing AI in Stratego.

02

Achieved top-3 ranking on Gravon platform in 2022.

03

Beats human expert players in Stratego.

Abstract

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $1 0^{535}$ nodes, i.e., $1 0^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $1 0^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deepmind/open_spiel
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Sports Analytics and Performance