Joint Differentiable Optimization and Verification for Certified Reinforcement Learning
Yixuan Wang, Simon Zhan, Zhilu Wang, Chao Huang, Zhaoran Wang, Zhuoran, Yang, Qi Zhu

TL;DR
This paper introduces a joint differentiable framework for reinforcement learning and formal verification, enabling the synthesis of controllers with safety guarantees in safety-critical systems.
Contribution
It proposes a novel bilevel optimization approach that integrates learning and verification, improving the certification process during controller training.
Findings
Outperforms SVG and PPO in safety and stability certification
Successfully synthesizes controllers with barrier and Lyapunov functions
Demonstrates effectiveness across various safety-critical examples
Abstract
In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties (e.g., safety, stability) under the learned controller. However, as existing methods typically apply formal verification \emph{after} the controller has been learned, it is sometimes difficult to obtain any certificate, even after many iterations between learning and verification. To address this challenge, we propose a framework that jointly conducts reinforcement learning and formal verification by formulating and solving a novel bilevel optimization problem, which is differentiable by the gradients from the value function and certificates. Experiments on a variety of examples demonstrate the significant advantages of our framework over the model-based stochastic value gradient (SVG) method and the model-free proximal policy optimization (PPO) method in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMechanical Circulatory Support Devices · Cardiovascular Function and Risk Factors · Cardiac electrophysiology and arrhythmias
