SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning

Maksim Anisimov (Imperial College London); Francesco Belardinelli (Imperial College London); Matthew Wicker (Imperial College London)

arXiv:2604.09452·cs.LG·April 13, 2026

SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning

Maksim Anisimov (Imperial College London), Francesco Belardinelli (Imperial College London), Matthew Wicker (Imperial College London)

PDF

TL;DR

This paper introduces a method for safe policy updates in reinforcement learning that guarantees safety preservation during continual learning by projecting updates onto a certified safe policy region.

Contribution

It proposes the Rashomon set, a certified safe policy region, and a projection method to ensure formal safety guarantees during policy updates in reinforcement learning.

Findings

01

Guarantees safety preservation during policy updates in grid-world tasks.

02

Outperforms regularisation baselines that forget safety constraints.

03

Provides a provably safe policy update framework for continual RL.

Abstract

Safety guarantees are a prerequisite to the deployment of reinforcement learning (RL) agents in safety-critical tasks. Often, deployment environments exhibit non-stationary dynamics or are subject to changing performance goals, requiring updates to the learned policy. This leads to a fundamental challenge: how to update an RL policy while preserving its safety properties on previously encountered tasks? The majority of current approaches either do not provide formal guarantees or verify policy safety only a posteriori. We propose a novel a priori approach to safe policy updates in continual RL by introducing the Rashomon set: a region in policy parameter space certified to meet safety constraints within the demonstration data distribution. We then show that one can provide formal, provable guarantees for arbitrary RL algorithms used to update a policy by projecting their updates onto…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.