Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning

Janaka Chathuranga Brahmanage; Akshat Kumar

arXiv:2603.22292·cs.LG·April 1, 2026

Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning

Janaka Chathuranga Brahmanage, Akshat Kumar

PDF

TL;DR

This paper introduces a novel offline safe reinforcement learning method that uses safety-conditioned reachability sets to ensure safety without unstable optimization, demonstrated on benchmarks and maritime navigation.

Contribution

It extends reachability analysis to cumulative safety constraints and develops a safe RL algorithm that learns from fixed datasets without environment interaction.

Findings

01

Outperforms state-of-the-art baselines on offline safe RL benchmarks.

02

Maintains safety in a real-world maritime navigation task.

03

Avoids unstable min/max and Lagrangian optimization in safety enforcement.

Abstract

Sequential decision making using Markov Decision Process underpins many realworld applications. Both model-based and model free methods have achieved strong results in these settings. However, real-world tasks must balance reward maximization with safety constraints, often conflicting objectives, that can lead to unstable min/max, adversarial optimization. A promising alternative is safety reachability analysis, which precomputes a forward-invariant safe state, action set, ensuring that an agent starting inside this set remains safe indefinitely. Yet, most reachability based methods address only hard safety constraints, and little work extends reachability to cumulative cost constraints. To address this, first, we define a safetyconditioned reachability set that decouples reward maximization from cumulative safety cost constraints. Second, we show how this set enforces safety…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.