Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Johannes Heinrich, David Silver

TL;DR
This paper introduces Neural Fictitious Self-Play (NFSP), a scalable deep reinforcement learning method that learns approximate Nash equilibria in large-scale imperfect-information games without prior domain knowledge.
Contribution
It presents the first end-to-end approach combining self-play and deep RL to learn equilibria in complex imperfect-information games.
Findings
NFSP approached Nash equilibrium in Leduc poker.
NFSP learned competitive strategies in Limit Texas Holdem.
Common RL methods diverged in these settings.
Abstract
Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Gambling Behavior and Treatments
