Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning
Janaka Chathuranga Brahmanage, Akshat Kumar

TL;DR
This paper introduces a novel offline safe reinforcement learning method that uses safety-conditioned reachability sets to ensure safety without unstable optimization, demonstrated on benchmarks and maritime navigation.
Contribution
It extends reachability analysis to cumulative safety constraints and develops a safe RL algorithm that learns from fixed datasets without environment interaction.
Findings
Outperforms state-of-the-art baselines on offline safe RL benchmarks.
Maintains safety in a real-world maritime navigation task.
Avoids unstable min/max and Lagrangian optimization in safety enforcement.
Abstract
Sequential decision making using Markov Decision Process underpins many realworld applications. Both model-based and model free methods have achieved strong results in these settings. However, real-world tasks must balance reward maximization with safety constraints, often conflicting objectives, that can lead to unstable min/max, adversarial optimization. A promising alternative is safety reachability analysis, which precomputes a forward-invariant safe state, action set, ensuring that an agent starting inside this set remains safe indefinitely. Yet, most reachability based methods address only hard safety constraints, and little work extends reachability to cumulative cost constraints. To address this, first, we define a safetyconditioned reachability set that decouples reward maximization from cumulative safety cost constraints. Second, we show how this set enforces safety…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
