Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search

Samuel Sokota; Eugene Vinitsky; Hengyuan Hu; J. Zico Kolter; Gabriele Farina

arXiv:2511.07312·cs.LG·November 11, 2025

Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search

Samuel Sokota, Eugene Vinitsky, Hengyuan Hu, J. Zico Kolter, Gabriele Farina

PDF

Open Access

TL;DR

This paper demonstrates that using self-play reinforcement learning combined with test-time search can achieve superhuman performance in Stratego at a fraction of previous training costs, marking a significant advancement in AI for complex imperfect information games.

Contribution

The authors introduce a novel approach combining self-play reinforcement learning and test-time search to surpass human performance in Stratego efficiently.

Findings

01

Achieved superhuman Stratego performance with minimal training costs.

02

Developed general methods for reinforcement learning under imperfect information.

03

Significantly outperformed previous AI approaches in Stratego.

Abstract

Few classical games have been regarded as such significant benchmarks of artificial intelligence as to have justified training costs in the millions of dollars. Among these, Stratego -- a board wargame exemplifying the challenge of strategic decision making under massive amounts of hidden information -- stands apart as a case where such efforts failed to produce performance at the level of top humans. This work establishes a step change in both performance and cost for Stratego, showing that it is now possible not only to reach the level of top humans, but to achieve vastly superhuman level -- and that doing so requires not an industrial budget, but merely a few thousand dollars. We achieved this result by developing general approaches for self-play reinforcement learning and test-time search under imperfect information.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Advanced Bandit Algorithms Research