Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field
Marin Toromanoff, Emilie Wirbel, Fabien Moutarde

TL;DR
This paper introduces SABER, a standardized benchmark for evaluating Deep Reinforcement Learning on Atari, highlights issues with previous performance claims, and presents Rainbow-IQN as a new state-of-the-art method.
Contribution
The paper proposes SABER for reproducible Atari evaluation, critiques superhuman claims, and introduces Rainbow-IQN achieving improved performance.
Findings
SABER enables consistent evaluation of DRL agents.
Previous superhuman claims may be inaccurate due to evaluation inconsistencies.
Rainbow-IQN sets new state-of-the-art results on Atari.
Abstract
Consistent and reproducible evaluation of Deep Reinforcement Learning (DRL) is not straightforward. In the Arcade Learning Environment (ALE), small changes in environment parameters such as stochasticity or the maximum allowed play time can lead to very different performance. In this work, we discuss the difficulties of comparing different agents trained on ALE. In order to take a step further towards reproducible and comparable DRL, we introduce SABER, a Standardized Atari BEnchmark for general Reinforcement learning algorithms. Our methodology extends previous recommendations and contains a complete set of environment parameters as well as train and test procedures. We then use SABER to evaluate the current state of the art, Rainbow. Furthermore, we introduce a human world records baseline, and argue that previous claims of expert or superhuman performance of DRL might not be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Artificial Intelligence in Games
