Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen, Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis, Hassabis, Thore Graepel, Timothy Lillicrap, David Silver

TL;DR
MuZero is a novel reinforcement learning algorithm that combines tree search with a learned model to achieve superhuman performance in complex domains like Atari, Go, chess, and shogi without prior knowledge of environment dynamics.
Contribution
The paper introduces MuZero, a model-based reinforcement learning method that learns a predictive model of environment dynamics and planning targets directly from raw data, outperforming previous algorithms.
Findings
Achieved state-of-the-art results on 57 Atari games.
Matched superhuman performance on Go, chess, and shogi without prior knowledge.
Demonstrated effectiveness in complex, visually rich environments.
Abstract
Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games - the canonical video game environment for testing AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model· youtube
MuZero: DeepMind’s New AI Mastered More Than 50 Games· youtube
Harri Valpola: System 2 AI and Planning in Model-Based Reinforcement Learning· youtube
MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | RL Paper explained· youtube
Taxonomy
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Residual Connection · Convolution · Residual Block · Prioritized Experience Replay · Average Pooling · Monte-Carlo Tree Search · MuZero · AlphaZero
