Safety Assessment in Reinforcement Learning via Model Predictive Control

Jeff Pflueger; Michael Everett

arXiv:2510.20955·cs.LG·October 27, 2025

Safety Assessment in Reinforcement Learning via Model Predictive Control

Jeff Pflueger, Michael Everett

PDF

TL;DR

This paper introduces a safety assessment method for reinforcement learning that uses model predictive control to prevent unsafe actions without needing explicit safety specifications, ensuring safer training.

Contribution

It proposes leveraging reversibility and model predictive path integral control to provide safety guarantees in model-free reinforcement learning without detailed safety knowledge.

Findings

01

Successfully aborts unsafe actions during training

02

Achieves comparable training progress to baseline methods

03

Provides safety guarantees without explicit safety constraints

Abstract

Model-free reinforcement learning approaches are promising for control but typically lack formal safety guarantees. Existing methods to shield or otherwise provide these guarantees often rely on detailed knowledge of the safety specifications. Instead, this work's insight is that many difficult-to-specify safety issues are best characterized by invariance. Accordingly, we propose to leverage reversibility as a method for preventing these safety issues throughout the training process. Our method uses model-predictive path integral control to check the safety of an action proposed by a learned policy throughout training. A key advantage of this approach is that it only requires the ability to query the black-box dynamics, not explicit knowledge of the dynamics or safety constraints. Experimental results demonstrate that the proposed algorithm successfully aborts before all unsafe actions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.