Provably Optimal Reinforcement Learning under Safety Filtering
Donggeon David Oh, Duy P. Nguyen, Haimin Hu, Jaime F. Fisac

TL;DR
This paper proves that safety filters in reinforcement learning can be permissive enough to ensure safety without sacrificing asymptotic performance, providing a theoretical foundation and validation for safe RL.
Contribution
It formalizes safety in RL with a safety-critical MDP and proves that safety filters do not degrade asymptotic performance, separating safety enforcement from learning.
Findings
Safety filters can be permissive without performance loss
Theoretical guarantees for safety and convergence in filtered MDPs
Empirical validation shows zero safety violations and high performance
Abstract
Recent advances in reinforcement learning (RL) enable its use on increasingly complex tasks, but the lack of formal safety guarantees still limits its application in safety-critical settings. A common practical approach is to augment the RL policy with a safety filter that overrides unsafe actions to prevent failures during both training and deployment. However, safety filtering is often perceived as sacrificing performance and hindering the learning process. We show that this perceived safety-performance tradeoff is not inherent and prove, for the first time, that enforcing safety with a sufficiently permissive safety filter does not degrade asymptotic performance. We formalize RL safety with a safety-critical Markov decision process (SC-MDP), which requires categorical, rather than high-probability, avoidance of catastrophic failure states. Additionally, we define an associated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Smart Grid Security and Resilience
