Safe Exploration in Model-based Reinforcement Learning using Control Barrier Functions
Max H. Cohen, Calin Belta

TL;DR
This paper introduces a model-based reinforcement learning framework that ensures safety during exploration by using novel Lyapunov-like control barrier functions, enabling safe and efficient learning of optimal control policies.
Contribution
The paper proposes Lyapunov-like control barrier functions (LCBFs) that extend traditional CBFs, facilitating safe exploration in model-based reinforcement learning with safety guarantees.
Findings
Handles more general safety constraints than existing methods
Guarantees safety during online learning and exploration
Demonstrates effectiveness through numerical examples
Abstract
This paper develops a model-based reinforcement learning (MBRL) framework for learning online the value function of an infinite-horizon optimal control problem while obeying safety constraints expressed as control barrier functions (CBFs). Our approach is facilitated by the development of a novel class of CBFs, termed Lyapunov-like CBFs (LCBFs), that retain the beneficial properties of CBFs for developing minimally-invasive safe control policies while also possessing desirable Lyapunov-like qualities such as positive semi-definiteness. We show how these LCBFs can be used to augment a learning-based control policy to guarantee safety and then leverage this approach to develop a safe exploration framework in a MBRL setting. We demonstrate that our approach can handle more general safety constraints than comparative methods via numerical examples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSafety Systems Engineering in Autonomy · Reinforcement Learning in Robotics · Viral Infectious Diseases and Gene Expression in Insects
