Decoupled Guidance Diffusion for Adaptive Offline Safe Reinforcement Learning
Rufeng Chen, Zhaofan Zhang, Zhejiang Yang, Hechang Chen, Sihong Xie

TL;DR
This paper introduces Safe Decoupled Guidance Diffusion (SDGD), a novel method for adaptive offline safe reinforcement learning that effectively balances reward optimization and safety constraints during trajectory generation.
Contribution
The paper proposes SDGD, which conditions guidance on safety limits and uses reward shaping to improve safety compliance and reward performance in offline safe RL.
Findings
SDGD achieves safety constraint satisfaction on 94.7% of tasks in the DSRL benchmark.
SDGD outperforms baselines in safety compliance and reward on multiple tasks.
FTR effectively suppresses reward-induced cost drift under certain conditions.
Abstract
Offline safe reinforcement learning often requires policies to adapt at deployment time to safety budgets that vary across episodes or change within a single episode. While diffusion-based planners enable flexible trajectory generation, existing guidance schemes often treat reward improvement and constraint satisfaction as competing gradient objectives, which can lead to unreliable safety compliance under cost limits. We reinterpret adaptive safe trajectory generation as sampling from a constrained trajectory distribution, where the budget restricts the trajectory region, and reward shapes preferences within that region. This perspective motivates Safe Decoupled Guidance Diffusion (SDGD), which conditions classifier-free guidance on the cost limit to bias sampling toward trajectories satisfying the specified limit, while using reward-gradient guidance to refine trajectories for higher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
