Advantage-Guided Diffusion for Model-Based Reinforcement Learning
Daniele Foffano, Arvid Eriksson, David Broman, Karl H. Johansson, Alexandre Proutiere

TL;DR
This paper introduces Advantage-Guided Diffusion for Model-Based Reinforcement Learning, which uses advantage estimates to steer diffusion models towards higher long-term return trajectories, improving sample efficiency and performance.
Contribution
It proposes a novel advantage-guided diffusion method with two guides, SAG and EAG, enabling policy improvement without changing the diffusion training objective.
Findings
AGD-MBRL outperforms baselines on MuJoCo tasks.
Advantage guidance improves sample efficiency and final returns.
Trajectories generated follow higher-value policies.
Abstract
Model-based reinforcement learning (MBRL) with autoregressive world models suffers from compounding errors, whereas diffusion world models mitigate this by generating trajectory segments jointly. However, existing diffusion guides are either policy-only, discarding value information, or reward-based, which becomes myopic when the diffusion horizon is short. We introduce Advantage-Guided Diffusion for MBRL (AGD-MBRL), which steers the reverse diffusion process using the agent's advantage estimates so that sampling concentrates on trajectories expected to yield higher long-term return beyond the generated window. We develop two guides: (i) Sigmoid Advantage Guidance (SAG) and (ii) Exponential Advantage Guidance (EAG). We prove that a diffusion model guided through SAG or EAG allows us to perform reweighted sampling of trajectories with weights increasing in state-action advantage-implying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
