Deep Reinforcement Learning from Self-Play in Imperfect-Information   Games

Johannes Heinrich; David Silver

arXiv:1603.01121·cs.LG·June 29, 2016·145 cites

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Johannes Heinrich, David Silver

PDF

Open Access 5 Repos

TL;DR

This paper introduces Neural Fictitious Self-Play (NFSP), a scalable deep reinforcement learning method that learns approximate Nash equilibria in large-scale imperfect-information games without prior domain knowledge.

Contribution

It presents the first end-to-end approach combining self-play and deep RL to learn equilibria in complex imperfect-information games.

Findings

01

NFSP approached Nash equilibrium in Leduc poker.

02

NFSP learned competitive strategies in Limit Texas Holdem.

03

Common RL methods diverged in these settings.

Abstract

Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Gambling Behavior and Treatments