Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for   Offline Reinforcement Learning

Zihan Ding; Amy Zhang; Yuandong Tian; Qinqing Zheng

arXiv:2402.03570·cs.LG·October 17, 2024·1 cites

Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning

Zihan Ding, Amy Zhang, Yuandong Tian, Qinqing Zheng

PDF

Open Access

TL;DR

The paper introduces Diffusion World Model (DWM), a novel generative model for long-horizon future state and reward prediction in offline reinforcement learning, outperforming traditional models in robustness and accuracy.

Contribution

DWM is the first diffusion-based model capable of multistep future prediction in offline RL, enabling efficient long-horizon simulation without recursive queries.

Findings

01

DWM achieves a 44% performance gain over one-step models.

02

DWM's robustness to long-horizon simulation is confirmed.

03

DWM's performance is comparable or superior to model-free methods.

Abstract

We introduce Diffusion World Model (DWM), a conditional diffusion model capable of predicting multistep future states and rewards concurrently. As opposed to traditional one-step dynamics models, DWM offers long-horizon predictions in a single forward pass, eliminating the need for recursive queries. We integrate DWM into model-based value estimation, where the short-term return is simulated by future trajectories sampled from DWM. In the context of offline reinforcement learning, DWM can be viewed as a conservative value regularization through generative modeling. Alternatively, it can be seen as a data source that enables offline Q-learning with synthetic data. Our experiments on the D4RL dataset confirm the robustness of DWM to long-horizon simulation. In terms of absolute performance, DWM significantly surpasses one-step dynamics models with a $44%$ performance gain, and is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOpinion Dynamics and Social Influence · Complex Systems and Time Series Analysis

MethodsDiffusion · Q-Learning