Learning Control Policies for Stochastic Systems with Reach-avoid   Guarantees

{\DJ}or{\dj}e \v{Z}ikeli\'c; Mathias Lechner; Thomas A. Henzinger,; Krishnendu Chatterjee

arXiv:2210.05308·cs.LG·November 30, 2022·1 cites

Learning Control Policies for Stochastic Systems with Reach-avoid Guarantees

{\DJ}or{\dj}e \v{Z}ikeli\'c, Mathias Lechner, Thomas A. Henzinger,, Krishnendu Chatterjee

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel method for learning control policies with formal reach-avoid guarantees for stochastic systems, using neural network certificates called reach-avoid supermartingales, validated on nonlinear RL tasks.

Contribution

It presents the first approach to provide formal reach-avoid guarantees for stochastic systems using neural network certificates, unifying stability and safety over infinite horizons.

Findings

01

Successfully learned control policies with reach-avoid guarantees

02

Verified reach-avoid specifications for fixed policies

03

Fine-tuned policies to satisfy reach-avoid criteria

Abstract

We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold $p \in [0, 1]$ over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning Control Policies for Stochastic Systems with Reach-avoid Guarantees· underline

Taxonomy

TopicsReinforcement Learning in Robotics