HarmonyDream: Task Harmonization Inside World Models
Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao,, Jianmin Wang, Mingsheng Long

TL;DR
HarmonyDream enhances model-based reinforcement learning by dynamically balancing observation and reward modeling, leading to significant performance improvements on robotic and Atari benchmarks.
Contribution
It introduces HarmonyDream, a method that automatically adjusts loss coefficients to harmonize tasks within world models, improving sample efficiency and performance.
Findings
Achieves 10%-69% performance boost on robotic tasks
Sets new state-of-the-art on Atari 100K benchmark
Demonstrates the importance of task harmonization in world models
Abstract
Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning by utilizing a world model, which models how the environment works and typically encompasses components for two tasks: observation modeling and reward modeling. In this paper, through a dedicated empirical investigation, we gain a deeper understanding of the role each task plays in world models and uncover the overlooked potential of sample-efficient MBRL by mitigating the domination of either observation or reward modeling. Our key insight is that while prevalent approaches of explicit MBRL attempt to restore abundant details of the environment via observation models, it is difficult due to the environment's complexity and limited model capacity. On the other hand, reward models, while dominating implicit MBRL and adept at learning compact task-centric dynamics, are inadequate for sample-efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsBalanced Selection
