LexiSafe: Offline Safe Reinforcement Learning with Lexicographic Safety-Reward Hierarchy

Hsin-Jung Yang; Zhanhong Jiang; Prajwal Koirala; Qisai Liu; Cody Fleming; Soumik Sarkar

arXiv:2602.17312·cs.LG·March 12, 2026

LexiSafe: Offline Safe Reinforcement Learning with Lexicographic Safety-Reward Hierarchy

Hsin-Jung Yang, Zhanhong Jiang, Prajwal Koirala, Qisai Liu, Cody Fleming, Soumik Sarkar

PDF

Open Access

TL;DR

LexiSafe introduces a hierarchical offline safe reinforcement learning framework that prioritizes safety to prevent violations, providing theoretical guarantees and improved empirical safety and performance in cyber-physical systems.

Contribution

It proposes a novel lexicographic hierarchy for offline safe RL, with theoretical bounds and extensions to multiple safety constraints, enhancing safety preservation.

Findings

01

Reduces safety violations compared to baselines

02

Improves task performance in safety-critical settings

03

Provides sample-complexity guarantees for safety and performance

Abstract

Offline safe reinforcement learning (RL) is increasingly important for cyber-physical systems (CPS), where safety violations during training are unacceptable and only pre-collected data are available. Existing offline safe RL methods typically balance reward-safety tradeoffs through constraint relaxation or joint optimization, but they often lack structural mechanisms to prevent safety drift. We propose LexiSafe, a lexicographic offline RL framework designed to preserve safety-aligned behavior. We first develop LexiSafe-SC, a single-cost formulation for standard offline safe RL, and derive safety-violation and performance-suboptimality bounds that together yield sample-complexity guarantees. We then extend the framework to hierarchical safety requirements with LexiSafe-MC, which supports multiple safety costs and admits its own sample-complexity analysis. Empirically, LexiSafe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Security and Resilience · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning