Mastering Atari Games with Limited Data
Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao

TL;DR
EfficientZero is a novel model-based visual reinforcement learning algorithm that achieves super-human performance on Atari games using only 100k environment steps, significantly reducing data requirements compared to previous methods.
Contribution
The paper introduces EfficientZero, a sample-efficient, model-based RL algorithm built on MuZero, achieving state-of-the-art performance with minimal data on Atari and DMControl benchmarks.
Findings
Achieves 194.3% mean human performance on Atari 100k benchmark.
Outperforms SAC on some DMControl 100k tasks.
Consumes 500 times less data than DQN to reach comparable performance.
Abstract
Reinforcement learning has achieved great success in many applications. However, sample efficiency remains a key challenge, with prominent methods requiring millions (or even billions) of environment steps to train. Recently, there has been significant progress in sample efficient image-based RL algorithms; however, consistent human-level performance on the Atari game benchmark remains an elusive goal. We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero. Our method achieves 194.3% mean human performance and 109.0% median performance on the Atari 100k benchmark with only two hours of real-time game experience and outperforms the state SAC in some tasks on the DMControl 100k benchmark. This is the first time an algorithm achieves super-human performance on Atari games with such little data. EfficientZero's performance is also close to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Human Pose and Action Recognition · Multimodal Machine Learning Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Residual Connection · Residual Block · Average Pooling · Global Average Pooling · Dilated Convolution · 1x1 Convolution · Switchable Atrous Convolution · Convolution
