Human-level Atari 200x faster
Steven Kapturowski, V\'ictor Campos, Ray Jiang, Nemanja Raki\'cevi\'c,, Hado van Hasselt, Charles Blundell, Adri\`a Puigdom\`enech Badia

TL;DR
This paper introduces a highly data-efficient reinforcement learning agent that outperforms human benchmarks on Atari games, achieving 200 times less experience requirement than previous state-of-the-art methods.
Contribution
The authors develop a robust, efficient agent by integrating trust region methods, normalization schemes, NFNets-inspired architecture, and policy distillation, significantly reducing data needs.
Findings
Achieved 200-fold reduction in experience needed compared to Agent57.
Outperformed human benchmarks on all Atari 57 games.
Demonstrated competitive performance with methods like Muesli and MuZero.
Abstract
The task of building general agents that perform well over a wide range of tasks has been an important goal in reinforcement learning since its inception. The problem has been subject of research of a large body of work, with performance frequently measured by observing scores over the wide range of environments contained in the Atari 57 benchmark. Agent57 was the first agent to surpass the human benchmark on all 57 games, but this came at the cost of poor data-efficiency, requiring nearly 80 billion frames of experience to achieve. Taking Agent57 as a starting point, we employ a diverse set of strategies to achieve a 200-fold reduction of experience needed to out perform the human baseline. We investigate a range of instabilities and bottlenecks we encountered while reducing the data regime, and propose effective solutions to build a more robust and efficient agent. We also demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Explainable Artificial Intelligence (XAI)
MethodsResidual Connection · Batch Normalization · Average Pooling · Monte-Carlo Tree Search · *Communicated@Fast*How Do I Communicate to Expedia? · Prioritized Experience Replay · Residual Block · Convolution · MuZero
