Joint Differentiable Optimization and Verification for Certified   Reinforcement Learning

Yixuan Wang; Simon Zhan; Zhilu Wang; Chao Huang; Zhaoran Wang; Zhuoran; Yang; Qi Zhu

arXiv:2201.12243·cs.LG·March 22, 2023

Joint Differentiable Optimization and Verification for Certified Reinforcement Learning

Yixuan Wang, Simon Zhan, Zhilu Wang, Chao Huang, Zhaoran Wang, Zhuoran, Yang, Qi Zhu

PDF

Open Access

TL;DR

This paper introduces a joint differentiable framework for reinforcement learning and formal verification, enabling the synthesis of controllers with safety guarantees in safety-critical systems.

Contribution

It proposes a novel bilevel optimization approach that integrates learning and verification, improving the certification process during controller training.

Findings

01

Outperforms SVG and PPO in safety and stability certification

02

Successfully synthesizes controllers with barrier and Lyapunov functions

03

Demonstrates effectiveness across various safety-critical examples

Abstract

In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties (e.g., safety, stability) under the learned controller. However, as existing methods typically apply formal verification \emph{after} the controller has been learned, it is sometimes difficult to obtain any certificate, even after many iterations between learning and verification. To address this challenge, we propose a framework that jointly conducts reinforcement learning and formal verification by formulating and solving a novel bilevel optimization problem, which is differentiable by the gradients from the value function and certificates. Experiments on a variety of examples demonstrate the significant advantages of our framework over the model-based stochastic value gradient (SVG) method and the model-free proximal policy optimization (PPO) method in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMechanical Circulatory Support Devices · Cardiovascular Function and Risk Factors · Cardiac electrophysiology and arrhythmias