A universal policy wrapper with guarantees

Anton Bolychev; Georgiy Malaniya; Grigory Yaremenko; Anastasia Krasnaya; Pavel Osinenko

arXiv:2505.12354·cs.LG·May 20, 2025

A universal policy wrapper with guarantees

Anton Bolychev, Georgiy Malaniya, Grigory Yaremenko, Anastasia Krasnaya, Pavel Osinenko

PDF

Open Access

TL;DR

This paper presents a universal policy wrapper for reinforcement learning that guarantees goal-reaching by switching between a high-performing policy and a safe fallback, ensuring safety without sacrificing performance.

Contribution

It introduces a generic wrapper that provides formal safety guarantees for any RL policy without requiring extra system knowledge or online optimization.

Findings

01

Guarantees goal-reaching with the fallback policy

02

Preserves or improves base policy performance

03

Operates without additional system knowledge

Abstract

We introduce a universal policy wrapper for reinforcement learning agents that ensures formal goal-reaching guarantees. In contrast to standard reinforcement learning algorithms that excel in performance but lack rigorous safety assurances, our wrapper selectively switches between a high-performing base policy -- derived from any existing RL method -- and a fallback policy with known convergence properties. Base policy's value function supervises this switching process, determining when the fallback policy should override the base policy to ensure the system remains on a stable path. The analysis proves that our wrapper inherits the fallback policy's goal-reaching guarantees while preserving or improving upon the performance of the base policy. Notably, it operates without needing additional system knowledge or online constrained optimization, making it readily deployable across diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control

MethodsBalanced Selection