Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation
Shangding Gu, Laixi Shi, Yuhao Ding, Alois Knoll, Costas Spanos, Adam, Wierman, Ming Jin

TL;DR
This paper introduces ESPO, a novel safe reinforcement learning method that improves sample efficiency and safety by dynamically manipulating samples during training, with theoretical guarantees and superior empirical performance.
Contribution
The paper proposes ESPO, a new safe RL approach that adaptively adjusts sampling to enhance efficiency and safety, with proven convergence and better sample complexity.
Findings
ESPO outperforms baselines in reward maximization and safety constraints.
ESPO reduces sample requirements by 25-29% compared to existing methods.
ESPO decreases training time by 21-38%.
Abstract
Safe reinforcement learning (RL) is crucial for deploying RL agents in real-world applications, as it aims to maximize long-term rewards while satisfying safety constraints. However, safe RL often suffers from sample inefficiency, requiring extensive interactions with the environment to learn a safe policy. We propose Efficient Safe Policy Optimization (ESPO), a novel approach that enhances the efficiency of safe RL through sample manipulation. ESPO employs an optimization framework with three modes: maximizing rewards, minimizing costs, and balancing the trade-off between the two. By dynamically adjusting the sampling process based on the observed conflict between reward and safety gradients, ESPO theoretically guarantees convergence, optimization stability, and improved sample complexity bounds. Experiments on the Safety-MuJoCo and Omnisafe benchmarks demonstrate that ESPO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Anomaly Detection Techniques and Applications
