Counterexample-Guided Repair of Reinforcement Learning Systems Using Safety Critics
David Boetius, Stefan Leue

TL;DR
This paper introduces a counterexample-guided repair method for reinforcement learning agents that uses safety critics and gradient-based optimization to fix unsafe behaviors without retraining from scratch.
Contribution
It presents a novel repair algorithm that jointly adjusts reinforcement learning agents and safety critics through constrained optimization.
Findings
Effective repair of unsafe behaviors in RL agents
Joint optimization improves safety and performance
Reduces need for extensive retraining
Abstract
Naively trained Deep Reinforcement Learning agents may fail to satisfy vital safety constraints. To avoid costly retraining, we may desire to repair a previously trained reinforcement learning agent to obviate unsafe behaviour. We devise a counterexample-guided repair algorithm for repairing reinforcement learning systems leveraging safety critics. The algorithm jointly repairs a reinforcement learning agent and a safety critic using gradient-based constrained optimisation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Neuroscience and Neural Engineering · Reinforcement Learning in Robotics
