World Models via Policy-Guided Trajectory Diffusion
Marc Rigter, Jun Yamada, Ingmar Posner

TL;DR
This paper introduces PolyGRAD, a non-autoregressive diffusion-based world model that generates entire trajectories in one pass, enabling efficient on-policy reinforcement learning in continuous control tasks.
Contribution
PolyGRAD is a novel, non-autoregressive world modeling approach using diffusion models guided by policy gradients, reducing prediction error accumulation and computational cost.
Findings
PolyGRAD outperforms state-of-the-art baselines in short trajectory prediction error.
PolyGRAD achieves similar errors to autoregressive models with lower computational requirements.
PolyGRAD enables effective on-policy RL training in MuJoCo environments.
Abstract
World models are a powerful tool for developing intelligent agents. By predicting the outcome of a sequence of actions, world models enable policies to be optimised via on-policy reinforcement learning (RL) using synthetic data, i.e. in "in imagination". Existing world models are autoregressive in that they interleave predicting the next state with sampling the next action from the policy. Prediction error inevitably compounds as the trajectory length grows. In this work, we propose a novel world modelling approach that is not autoregressive and generates entire on-policy trajectories in a single pass through a diffusion model. Our approach, Policy-Guided Trajectory Diffusion (PolyGRAD), leverages a denoising model in addition to the gradient of the action distribution of the policy to diffuse a trajectory of initially random states and actions into an on-policy synthetic trajectory. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsDiffusion
