HarmonyDream: Task Harmonization Inside World Models

Haoyu Ma; Jialong Wu; Ningya Feng; Chenjun Xiao; Dong Li; Jianye Hao,; Jianmin Wang; Mingsheng Long

arXiv:2310.00344·cs.LG·June 6, 2024·1 cites

HarmonyDream: Task Harmonization Inside World Models

Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao,, Jianmin Wang, Mingsheng Long

PDF

Open Access 1 Repo

TL;DR

HarmonyDream enhances model-based reinforcement learning by dynamically balancing observation and reward modeling, leading to significant performance improvements on robotic and Atari benchmarks.

Contribution

It introduces HarmonyDream, a method that automatically adjusts loss coefficients to harmonize tasks within world models, improving sample efficiency and performance.

Findings

01

Achieves 10%-69% performance boost on robotic tasks

02

Sets new state-of-the-art on Atari 100K benchmark

03

Demonstrates the importance of task harmonization in world models

Abstract

Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning by utilizing a world model, which models how the environment works and typically encompasses components for two tasks: observation modeling and reward modeling. In this paper, through a dedicated empirical investigation, we gain a deeper understanding of the role each task plays in world models and uncover the overlooked potential of sample-efficient MBRL by mitigating the domination of either observation or reward modeling. Our key insight is that while prevalent approaches of explicit MBRL attempt to restore abundant details of the environment via observation models, it is difficult due to the environment's complexity and limited model capacity. On the other hand, reward models, while dominating implicit MBRL and adept at learning compact task-centric dynamics, are inadequate for sample-efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thuml/harmonydream
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsBalanced Selection