Don't Trade Off Safety: Diffusion Regularization for Constrained Offline RL

Junyu Guo; Zhi Zheng; Donghao Ying; Ming Jin; Shangding Gu; Costas Spanos; Javad Lavaei

arXiv:2502.12391·cs.LG·September 8, 2025

Don't Trade Off Safety: Diffusion Regularization for Constrained Offline RL

Junyu Guo, Zhi Zheng, Donghao Ying, Ming Jin, Shangding Gu, Costas Spanos, Javad Lavaei

PDF

Open Access 1 Video

TL;DR

This paper introduces DRCORL, a novel offline RL method that uses diffusion models and gradient manipulation to ensure safety and high performance in constrained tasks, suitable for real-world applications.

Contribution

The paper proposes a diffusion-based regularization approach for constrained offline RL, enabling safe, efficient, and high-quality policy learning from fixed datasets.

Findings

01

Achieves reliable safety performance across robot tasks.

02

Ensures fast inference and strong reward outcomes.

03

Consistently meets safety constraints with minimal hyperparameter tuning.

Abstract

Constrained reinforcement learning (RL) seeks high-performance policies under safety constraints. We focus on an offline setting where the agent has only a fixed dataset -- common in realistic tasks to prevent unsafe exploration. To address this, we propose Diffusion-Regularized Constrained Offline Reinforcement Learning (DRCORL), which first uses a diffusion model to capture the behavioral policy from offline data and then extracts a simplified policy to enable efficient inference. We further apply gradient manipulation for safety adaptation, balancing the reward objective and constraint satisfaction. This approach leverages high-quality offline data while incorporating safety requirements. Empirical results show that DRCORL achieves reliable safety performance, fast inference, and strong reward outcomes across robot learning tasks. Compared to existing safe offline RL methods, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Don’t Trade Off Safety: Diffusion Regularization for Constrained Offline RL· slideslive

Taxonomy

TopicsRisk and Safety Analysis · Safety Systems Engineering in Autonomy · Nuclear and radioactivity studies