AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

Micha\"el Mathieu; Sherjil Ozair; Srivatsan Srinivasan; Caglar; Gulcehre; Shangtong Zhang; Ray Jiang; Tom Le Paine; Richard Powell; Konrad; \.Zo{\l}na; Julian Schrittwieser; David Choi; Petko Georgiev; Daniel Toyama,; Aja Huang; Roman Ring; Igor Babuschkin; Timo Ewalds; Mahyar Bordbar; Sarah; Henderson; Sergio G\'omez Colmenarejo; A\"aron van den Oord; Wojciech Marian; Czarnecki; Nando de Freitas; Oriol Vinyals

arXiv:2308.03526·cs.LG·August 8, 2023·5 cites

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

Micha\"el Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar, Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad, \.Zo{\l}na, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama,, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds

PDF

Open Access 1 Repo

TL;DR

This paper introduces AlphaStar Unplugged, a large-scale offline RL benchmark for StarCraft II, leveraging a massive human gameplay dataset to develop and evaluate offline RL agents in a complex, real-time strategy environment.

Contribution

It establishes a new offline RL benchmark for StarCraft II, providing datasets, tools, and evaluation protocols, and demonstrates improved agents using only offline data.

Findings

01

Achieved 90% win rate against previous behavior cloning agents.

02

Established a standardized API and evaluation protocol for offline StarCraft II RL.

03

Presented baseline agents including behavior cloning, offline actor-critic, and MuZero variants.

Abstract

StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution. It also has an active professional competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of its challenging nature and because Blizzard has released a massive dataset of millions of StarCraft II games played by human players. This paper leverages that and establishes a benchmark, called AlphaStar Unplugged, introducing unprecedented challenges for offline reinforcement learning. We define a dataset (a subset of Blizzard's release), tools standardizing an API for machine learning methods, and an evaluation protocol. We also present baseline agents, including behavior cloning, offline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deepmind/alphastar
jax

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Digital Games and Media · Sports Analytics and Performance

Methods[LivE@PeRson]How do I talk to a real person at Expedia? · *Communicated@Fast*How Do I Communicate to Expedia? · Attention Is All You Need · Label Smoothing · Linear Layer · Adam · Dropout · Absolute Position Encodings · Monte-Carlo Tree Search · Byte Pair Encoding