Safe Reinforcement Learning via Confidence-Based Filters
Sebastian Curi, Armin Lederer, Sandra Hirche, Andreas Krause

TL;DR
This paper introduces confidence-based safety filters for reinforcement learning that certify safety constraints using probabilistic models, enabling safe policy deployment with formal guarantees.
Contribution
It presents a control-theoretic framework that reformulates safety constraints into cost functions and extends them with hallucinating inputs to ensure high-probability safety.
Findings
Formal safety guarantees are provided.
Empirical results demonstrate effectiveness.
The approach enables safe policy adjustments in real-time.
Abstract
Ensuring safety is a crucial challenge when deploying reinforcement learning (RL) to real-world systems. We develop confidence-based safety filters, a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard RL techniques, based on probabilistic dynamics models. Our approach is based on a reformulation of state constraints in terms of cost functions, reducing safety verification to a standard RL task. By exploiting the concept of hallucinating inputs, we extend this formulation to determine a "backup" policy that is safe for the unknown system with high probability. Finally, the nominal policy is minimally adjusted at every time step during a roll-out towards the backup policy, such that safe recovery can be guaranteed afterwards. We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSafety Systems Engineering in Autonomy · Software Reliability and Analysis Research · Adversarial Robustness in Machine Learning
