A Review for Deep Reinforcement Learning in Atari:Benchmarks, Challenges, and Solutions
Jiajun Fan

TL;DR
This paper reviews deep reinforcement learning in Atari, critiques current evaluation metrics, proposes a new benchmark based on human world records, and discusses challenges and solutions for surpassing human performance.
Contribution
It introduces a novel Atari benchmark based on human world records and analyzes the limitations of current evaluation criteria in RL research.
Findings
Current evaluation metrics underestimate human performance.
Proposed benchmark raises the bar for RL agents.
Identified four key challenges hindering superhuman performance.
Abstract
The Arcade Learning Environment (ALE) is proposed as an evaluation platform for empirically assessing the generality of agents across dozens of Atari 2600 games. ALE offers various challenging problems and has drawn significant attention from the deep reinforcement learning (RL) community. From Deep Q-Networks (DQN) to Agent57, RL agents seem to achieve superhuman performance in ALE. However, is this the case? In this paper, to explore this problem, we first review the current evaluation metrics in the Atari benchmarks and then reveal that the current evaluation criteria of achieving superhuman performance are inappropriate, which underestimated the human performance relative to what is possible. To handle those problems and promote the development of RL research, we propose a novel Atari benchmark based on human world records (HWR), which puts forward higher requirements for RL agents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Digital Games and Media · Artificial Intelligence in Games
