Dota 2 with Large Scale Deep Reinforcement Learning

OpenAI: Christopher Berner; Greg Brockman; Brooke Chan; Vicki Cheung,; Przemys{\l}aw D\k{e}biak; Christy Dennison; David Farhi; Quirin Fischer,; Shariq Hashme; Chris Hesse; Rafal J\'ozefowicz; Scott Gray; Catherine Olsson,; Jakub Pachocki; Michael Petrov; Henrique P. d.O. Pinto; Jonathan Raiman; Tim; Salimans; Jeremy Schlatter; Jonas Schneider; Szymon Sidor; Ilya Sutskever,; Jie Tang; Filip Wolski; Susan Zhang

arXiv:1912.06680·cs.LG·March 10, 2021·1.0k cites

Dota 2 with Large Scale Deep Reinforcement Learning

OpenAI: Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung,, Przemys{\l}aw D\k{e}biak, Christy Dennison, David Farhi, Quirin Fischer,, Shariq Hashme, Chris Hesse, Rafal J\'ozefowicz, Scott Gray, Catherine Olsson,, Jakub Pachocki, Michael Petrov, Henrique P. d.O. Pinto

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper describes how OpenAI developed a large-scale deep reinforcement learning system, OpenAI Five, which mastered Dota 2 through extensive distributed training, achieving superhuman performance by defeating world champions.

Contribution

The paper introduces a scalable distributed training system for reinforcement learning and demonstrates its effectiveness in mastering complex, imperfect-information games like Dota 2.

Findings

01

OpenAI Five defeated the Dota 2 world champion Team OG.

02

The system trained for 10 months using 2 million frames per batch.

03

Self-play reinforcement learning achieved superhuman performance.

Abstract

On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds. We developed a distributed training system and tools for continual training which allowed us to train OpenAI Five for 10 months. By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bilibili/LastOrder-Dota2
none

Videos

OpenAI Performs Surgery On A Neural Network to Play DOTA 2· youtube

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Advanced Bandit Algorithms Research