Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Julian Schrittwieser; Ioannis Antonoglou; Thomas Hubert; Karen; Simonyan; Laurent Sifre; Simon Schmitt; Arthur Guez; Edward Lockhart; Demis; Hassabis; Thore Graepel; Timothy Lillicrap; David Silver

arXiv:1911.08265·cs.LG·January 27, 2021

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen, Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis, Hassabis, Thore Graepel, Timothy Lillicrap, David Silver

PDF

5 Repos 4 Videos

TL;DR

MuZero is a novel reinforcement learning algorithm that combines tree search with a learned model to achieve superhuman performance in complex domains like Atari, Go, chess, and shogi without prior knowledge of environment dynamics.

Contribution

The paper introduces MuZero, a model-based reinforcement learning method that learns a predictive model of environment dynamics and planning targets directly from raw data, outperforming previous algorithms.

Findings

01

Achieved state-of-the-art results on 57 Atari games.

02

Matched superhuman performance on Go, chess, and shogi without prior knowledge.

03

Demonstrated effectiveness in complex, visually rich environments.

Abstract

Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games - the canonical video game environment for testing AI…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model· youtube

MuZero: DeepMind’s New AI Mastered More Than 50 Games· youtube

Harri Valpola: System 2 AI and Planning in Model-Based Reinforcement Learning· youtube

MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | RL Paper explained· youtube

Taxonomy

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Residual Connection · Convolution · Residual Block · Prioritized Experience Replay · Average Pooling · Monte-Carlo Tree Search · MuZero · AlphaZero