Don't Trade Off Safety: Diffusion Regularization for Constrained Offline RL
Junyu Guo, Zhi Zheng, Donghao Ying, Ming Jin, Shangding Gu, Costas Spanos, Javad Lavaei

TL;DR
This paper introduces DRCORL, a novel offline RL method that uses diffusion models and gradient manipulation to ensure safety and high performance in constrained tasks, suitable for real-world applications.
Contribution
The paper proposes a diffusion-based regularization approach for constrained offline RL, enabling safe, efficient, and high-quality policy learning from fixed datasets.
Findings
Achieves reliable safety performance across robot tasks.
Ensures fast inference and strong reward outcomes.
Consistently meets safety constraints with minimal hyperparameter tuning.
Abstract
Constrained reinforcement learning (RL) seeks high-performance policies under safety constraints. We focus on an offline setting where the agent has only a fixed dataset -- common in realistic tasks to prevent unsafe exploration. To address this, we propose Diffusion-Regularized Constrained Offline Reinforcement Learning (DRCORL), which first uses a diffusion model to capture the behavioral policy from offline data and then extracts a simplified policy to enable efficient inference. We further apply gradient manipulation for safety adaptation, balancing the reward objective and constraint satisfaction. This approach leverages high-quality offline data while incorporating safety requirements. Empirical results show that DRCORL achieves reliable safety performance, fast inference, and strong reward outcomes across robot learning tasks. Compared to existing safe offline RL methods, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRisk and Safety Analysis · Safety Systems Engineering in Autonomy · Nuclear and radioactivity studies
