Performance Asymmetry in Model-Based Reinforcement Learning
Jing Yu Lim, Rushi Shah, Zarif Ikram, Samson Yu, Haozhe Ma, Tze-Yun Leong, Dianbo Liu

TL;DR
This paper uncovers a significant performance asymmetry in Model-Based Reinforcement Learning across Atari tasks, introduces a balanced evaluation metric, and proposes a novel world model to improve performance consistency and efficiency.
Contribution
It identifies and analyzes performance asymmetry in MBRL, introduces Sym-HNS for balanced evaluation, and develops the JEDI world model to mitigate asymmetry and enhance performance.
Findings
MBRL agents outperform humans in Agent-Optimal tasks
MBRL agents underperform in Human-Optimal tasks, with a 21X gap
JEDI model achieves state-of-the-art results and reduces performance asymmetry
Abstract
Recently, Model-Based Reinforcement Learning (MBRL) have achieved super-human level performance on the Atari100k benchmark on average. However, we discover that conventional aggregates mask a major problem, Performance Asymmetry: MBRL agents dramatically outperform humans in certain tasks (Agent-Optimal tasks) while drastically underperform humans in other tasks (Human-Optimal tasks). Indeed, despite achieving SOTA in the overall mean Human-Normalized Scores (HNS), the SOTA agent scored the worst among baselines on Human-Optimal tasks, with a striking 21X performance gap between the Human-Optimal and Agent-Optimal subsets. To address this, we partition Atari100k evenly into Human-Optimal and Agent-Optimal subsets, and introduce a more balanced aggregate, Sym-HNS. Furthermore, we trace the striking Performance Asymmetry in the SOTA pixel diffusion world model to the curse of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsDiffusion
