Atari-5: Distilling the Arcade Learning Environment down to Five Games
Matthew Aitchison, Penny Sweetser, Marcus Hutter

TL;DR
This paper introduces Atari-5, a small, representative subset of the Arcade Learning Environment that accurately approximates the full 57-game benchmark, reducing computational costs and improving reproducibility.
Contribution
The authors present a novel methodology for selecting minimal yet representative environment subsets, exemplified by Atari-5, which maintains score estimates within 10% of the full set.
Findings
Atari-5 approximates 57-game median scores within 10%.
A 10-game subset recovers 80% of score variance.
High correlation among ALE games enables effective compression.
Abstract
The Arcade Learning Environment (ALE) has become an essential benchmark for assessing the performance of reinforcement learning algorithms. However, the computational cost of generating results on the entire 57-game dataset limits ALE's use and makes the reproducibility of many results infeasible. We propose a novel solution to this problem in the form of a principled methodology for selecting small but representative subsets of environments within a benchmark suite. We applied our method to identify a subset of five ALE games, called Atari-5, which produces 57-game median score estimates within 10% of their true values. Extending the subset to 10-games recovers 80% of the variance for log-scores for all games within the 57-game set. We show this level of compression is possible due to a high degree of correlation between many of the games in ALE.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSports Analytics and Performance · Evolutionary Algorithms and Applications · Artificial Intelligence in Games
