Performance Asymmetry in Model-Based Reinforcement Learning

Jing Yu Lim; Rushi Shah; Zarif Ikram; Samson Yu; Haozhe Ma; Tze-Yun Leong; Dianbo Liu

arXiv:2505.19698·cs.LG·February 25, 2026

Performance Asymmetry in Model-Based Reinforcement Learning

Jing Yu Lim, Rushi Shah, Zarif Ikram, Samson Yu, Haozhe Ma, Tze-Yun Leong, Dianbo Liu

PDF

Open Access

TL;DR

This paper uncovers a significant performance asymmetry in Model-Based Reinforcement Learning across Atari tasks, introduces a balanced evaluation metric, and proposes a novel world model to improve performance consistency and efficiency.

Contribution

It identifies and analyzes performance asymmetry in MBRL, introduces Sym-HNS for balanced evaluation, and develops the JEDI world model to mitigate asymmetry and enhance performance.

Findings

01

MBRL agents outperform humans in Agent-Optimal tasks

02

MBRL agents underperform in Human-Optimal tasks, with a 21X gap

03

JEDI model achieves state-of-the-art results and reduces performance asymmetry

Abstract

Recently, Model-Based Reinforcement Learning (MBRL) have achieved super-human level performance on the Atari100k benchmark on average. However, we discover that conventional aggregates mask a major problem, Performance Asymmetry: MBRL agents dramatically outperform humans in certain tasks (Agent-Optimal tasks) while drastically underperform humans in other tasks (Human-Optimal tasks). Indeed, despite achieving SOTA in the overall mean Human-Normalized Scores (HNS), the SOTA agent scored the worst among baselines on Human-Optimal tasks, with a striking 21X performance gap between the Human-Optimal and Agent-Optimal subsets. To address this, we partition Atari100k evenly into Human-Optimal and Agent-Optimal subsets, and introduce a more balanced aggregate, Sym-HNS. Furthermore, we trace the striking Performance Asymmetry in the SOTA pixel diffusion world model to the curse of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsDiffusion