TL;DR
This paper introduces Prior Guidance, a new diffusion-based planning method for offline reinforcement learning that improves trajectory quality and efficiency by replacing the Gaussian prior with a learnable distribution, outperforming existing methods.
Contribution
The paper proposes Prior Guidance, a novel guided sampling framework that enhances diffusion planning in offline RL by using a learnable prior and behavior regularization, reducing inference costs.
Findings
Outperforms state-of-the-art diffusion policies and planners
Efficient training with behavior regularization in latent space
Achieves better long-horizon decision-making in benchmarks
Abstract
Diffusion models have recently gained prominence in offline reinforcement learning due to their ability to effectively learn high-performing, generalizable policies from static datasets. Diffusion-based planners facilitate long-horizon decision-making by generating high-quality trajectories through iterative denoising, guided by return-maximizing objectives. However, existing guided sampling strategies such as Classifier Guidance, Classifier-Free Guidance, and Monte Carlo Sample Selection either produce suboptimal multi-modal actions, struggle with distributional drift, or incur prohibitive inference-time costs. To address these challenges, we propose Prior Guidance (PG), a novel guided sampling framework that replaces the standard Gaussian prior of a behavior-cloned diffusion model with a learnable distribution, optimized via a behavior-regularized objective. PG directly generates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsDiffusion
