Deep Reinforcement Learning Agents are not even close to Human Intelligence

Quentin Delfosse; Jannis Bl\"uml; Fabian Tatai; Th\'eo Vincent; Bjarne Gregori; Elisabeth Dillies; Jan Peters; Constantin Rothkopf; Kristian Kersting

arXiv:2505.21731·cs.LG·May 29, 2025

Deep Reinforcement Learning Agents are not even close to Human Intelligence

Quentin Delfosse, Jannis Bl\"uml, Fabian Tatai, Th\'eo Vincent, Bjarne Gregori, Elisabeth Dillies, Jan Peters, Constantin Rothkopf, Kristian Kersting

PDF

Open Access

TL;DR

Deep reinforcement learning agents struggle with simple task variations, revealing a significant gap in their ability to generalize like humans, and highlighting the need for new benchmarks and testing methodologies.

Contribution

The paper introduces HackAtari, a novel set of simplified task variations for evaluating RL agents, exposing their reliance on shortcuts and lack of zero-shot adaptation.

Findings

01

RL agents perform poorly on simplified tasks compared to humans.

02

Agents rely on shortcuts, leading to performance drops on easier tasks.

03

Current evaluation methods are insufficient for measuring human-like intelligence.

Abstract

Deep reinforcement learning (RL) agents achieve impressive results in a wide variety of tasks, but they lack zero-shot adaptation capabilities. While most robustness evaluations focus on tasks complexifications, for which human also struggle to maintain performances, no evaluation has been performed on tasks simplifications. To tackle this issue, we introduce HackAtari, a set of task variations of the Arcade Learning Environments. We use it to demonstrate that, contrary to humans, RL agents systematically exhibit huge performance drops on simpler versions of their training tasks, uncovering agents' consistent reliance on shortcuts. Our analysis across multiple algorithms and architectures highlights the persistent gap between RL agents and human behavioral intelligence, underscoring the need for new benchmarks and methodologies that enforce systematic generalization testing beyond…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning