Counterexample-Guided Repair of Reinforcement Learning Systems Using   Safety Critics

David Boetius; Stefan Leue

arXiv:2405.15430·cs.LG·May 27, 2024

Counterexample-Guided Repair of Reinforcement Learning Systems Using Safety Critics

David Boetius, Stefan Leue

PDF

Open Access

TL;DR

This paper introduces a counterexample-guided repair method for reinforcement learning agents that uses safety critics and gradient-based optimization to fix unsafe behaviors without retraining from scratch.

Contribution

It presents a novel repair algorithm that jointly adjusts reinforcement learning agents and safety critics through constrained optimization.

Findings

01

Effective repair of unsafe behaviors in RL agents

02

Joint optimization improves safety and performance

03

Reduces need for extensive retraining

Abstract

Naively trained Deep Reinforcement Learning agents may fail to satisfy vital safety constraints. To avoid costly retraining, we may desire to repair a previously trained reinforcement learning agent to obviate unsafe behaviour. We devise a counterexample-guided repair algorithm for repairing reinforcement learning systems leveraging safety critics. The algorithm jointly repairs a reinforcement learning agent and a safety critic using gradient-based constrained optimisation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Neuroscience and Neural Engineering · Reinforcement Learning in Robotics