Provably Optimal Reinforcement Learning under Safety Filtering

Donggeon David Oh; Duy P. Nguyen; Haimin Hu; Jaime F. Fisac

arXiv:2510.18082·cs.LG·February 12, 2026

Provably Optimal Reinforcement Learning under Safety Filtering

Donggeon David Oh, Duy P. Nguyen, Haimin Hu, Jaime F. Fisac

PDF

Open Access

TL;DR

This paper proves that safety filters in reinforcement learning can be permissive enough to ensure safety without sacrificing asymptotic performance, providing a theoretical foundation and validation for safe RL.

Contribution

It formalizes safety in RL with a safety-critical MDP and proves that safety filters do not degrade asymptotic performance, separating safety enforcement from learning.

Findings

01

Safety filters can be permissive without performance loss

02

Theoretical guarantees for safety and convergence in filtered MDPs

03

Empirical validation shows zero safety violations and high performance

Abstract

Recent advances in reinforcement learning (RL) enable its use on increasingly complex tasks, but the lack of formal safety guarantees still limits its application in safety-critical settings. A common practical approach is to augment the RL policy with a safety filter that overrides unsafe actions to prevent failures during both training and deployment. However, safety filtering is often perceived as sacrificing performance and hindering the learning process. We show that this perceived safety-performance tradeoff is not inherent and prove, for the first time, that enforcing safety with a sufficiently permissive safety filter does not degrade asymptotic performance. We formalize RL safety with a safety-critical Markov decision process (SC-MDP), which requires categorical, rather than high-probability, avoidance of catastrophic failure states. Additionally, we define an associated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Smart Grid Security and Resilience