Kernel-Based Safe Exploration in Deep Reinforcement Learning

Rupak Majumdar; Nikhil Singh; Sadegh Soudjani

arXiv:2605.22207·eess.SY·May 22, 2026

Kernel-Based Safe Exploration in Deep Reinforcement Learning

Rupak Majumdar, Nikhil Singh, Sadegh Soudjani

PDF

TL;DR

This paper introduces KBSE, a kernel-based safe exploration algorithm for deep reinforcement learning that learns barrier functions to ensure probabilistic safety during exploration in unknown stochastic systems.

Contribution

It proposes a novel method using kernel embeddings to learn barrier functions simultaneously with policies, improving safety guarantees in deep RL.

Findings

01

KBSE effectively learns safe policies in complex continuous control tasks.

02

The learned barriers provide probabilistic safety guarantees during exploration.

03

KBSE maintains reward performance while ensuring safety.

Abstract

Safety has been a major concern when deploying deep reinforcement learning algorithms in the real world. A promising direction that ensures that the learned policy does not visit unsafe regions is to learn a \emph{barrier function} along with the policy. A barrier is a function from states to reals that assigns low values to the initial states, high values to the unsafe states, and decreases in expectation on each transition; such a function can be used to bound the probability of reaching unsafe states. Previous attempts learned a barrier function directly from exploration data, but this required either large amounts of data or restrictions on the system dynamics. In this paper, we show how kernel embeddings can be used to learn barrier functions during deep reinforcement learning for stochastic systems with unknown dynamics. Our algorithm, \emph{kernel-based safe exploration (KBSE)},…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.