# Safe Reinforcement Learning with Nonlinear Dynamics via Model Predictive   Shielding

**Authors:** Osbert Bastani

arXiv: 1905.10691 · 2020-10-22

## TL;DR

This paper introduces model predictive shielding (MPS), a method that ensures safety in reinforcement learning for robotics with known nonlinear dynamics by switching between learned and backup policies to prevent unsafe behaviors.

## Contribution

The paper presents MPS, a novel safety assurance approach for reinforcement learning with nonlinear dynamics, providing formal safety guarantees and empirical validation.

## Key findings

- MPS guarantees safety during policy execution.
- Empirical validation on cart-pole demonstrates effectiveness.
- Switching between policies maintains safety without sacrificing performance.

## Abstract

Reinforcement learning is a promising approach to synthesizing policies for challenging robotics tasks. A key problem is how to ensure safety of the learned policy---e.g., that a walking robot does not fall over or that an autonomous car does not run into an obstacle. We focus on the setting where the dynamics are known, and the goal is to ensure that a policy trained in simulation satisfies a given safety constraint. We propose an approach, called model predictive shielding (MPS), that switches on-the-fly between a learned policy and a backup policy to ensure safety. We prove that our approach guarantees safety, and empirically evaluate it on the cart-pole.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.10691/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1905.10691/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/1905.10691/full.md

---
Source: https://tomesphere.com/paper/1905.10691