Sampling-Based Safe Reinforcement Learning
Luca Vignola, Bruce D. Lee, Manish Prajapat, Manuel Wendl, Melanie Zeilinger, Andreas Krause, Yarden As

TL;DR
This paper introduces SBSRL, a model-based reinforcement learning algorithm that ensures safety during learning by constraining dynamics samples, providing theoretical guarantees and practical success in simulation and robotics.
Contribution
The paper presents a novel sampling-based approach for safe RL that offers safety guarantees and scalable deep-ensemble implementations for continuous control.
Findings
SBSRL achieves safe, efficient exploration in simulation and hardware.
Provides high-probability safety guarantees and finite-time sample complexity bounds.
Extends to high-dimensional continuous control with deep ensembles.
Abstract
Safe exploration remains a fundamental challenge in reinforcement learning (RL), limiting the deployment of RL agents in the real world. We propose Sampling-Based Safe Reinforcement Learning (SBSRL), a model-based RL algorithm that maintains safety throughout the learning process by enforcing constraints jointly across a finite set of dynamics samples. This formulation approximates an intractable worst-case optimization over uncertain dynamics and enables practical safety guarantees in continuous domains. We further introduce an exploration strategy based on constraining epistemic uncertainty, eliminating the need for explicit exploration bonuses. Under regularity conditions, we derive high-probability guarantees of safety throughout learning and a finite-time sample complexity bound for recovering a near-optimal policy. Empirically, SBSRL achieves safe and efficient exploration both in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
