Advantage-Guided Diffusion for Model-Based Reinforcement Learning

Daniele Foffano; Arvid Eriksson; David Broman; Karl H. Johansson; Alexandre Proutiere

arXiv:2604.09035·cs.AI·April 13, 2026

Advantage-Guided Diffusion for Model-Based Reinforcement Learning

Daniele Foffano, Arvid Eriksson, David Broman, Karl H. Johansson, Alexandre Proutiere

PDF

TL;DR

This paper introduces Advantage-Guided Diffusion for Model-Based Reinforcement Learning, which uses advantage estimates to steer diffusion models towards higher long-term return trajectories, improving sample efficiency and performance.

Contribution

It proposes a novel advantage-guided diffusion method with two guides, SAG and EAG, enabling policy improvement without changing the diffusion training objective.

Findings

01

AGD-MBRL outperforms baselines on MuJoCo tasks.

02

Advantage guidance improves sample efficiency and final returns.

03

Trajectories generated follow higher-value policies.

Abstract

Model-based reinforcement learning (MBRL) with autoregressive world models suffers from compounding errors, whereas diffusion world models mitigate this by generating trajectory segments jointly. However, existing diffusion guides are either policy-only, discarding value information, or reward-based, which becomes myopic when the diffusion horizon is short. We introduce Advantage-Guided Diffusion for MBRL (AGD-MBRL), which steers the reverse diffusion process using the agent's advantage estimates so that sampling concentrates on trajectories expected to yield higher long-term return beyond the generated window. We develop two guides: (i) Sigmoid Advantage Guidance (SAG) and (ii) Exponential Advantage Guidance (EAG). We prove that a diffusion model guided through SAG or EAG allows us to perform reweighted sampling of trajectories with weights increasing in state-action advantage-implying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.