Enhancing Efficiency of Safe Reinforcement Learning via Sample   Manipulation

Shangding Gu; Laixi Shi; Yuhao Ding; Alois Knoll; Costas Spanos; Adam; Wierman; Ming Jin

arXiv:2405.20860·cs.LG·June 3, 2024

Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation

Shangding Gu, Laixi Shi, Yuhao Ding, Alois Knoll, Costas Spanos, Adam, Wierman, Ming Jin

PDF

Open Access 1 Repo

TL;DR

This paper introduces ESPO, a novel safe reinforcement learning method that improves sample efficiency and safety by dynamically manipulating samples during training, with theoretical guarantees and superior empirical performance.

Contribution

The paper proposes ESPO, a new safe RL approach that adaptively adjusts sampling to enhance efficiency and safety, with proven convergence and better sample complexity.

Findings

01

ESPO outperforms baselines in reward maximization and safety constraints.

02

ESPO reduces sample requirements by 25-29% compared to existing methods.

03

ESPO decreases training time by 21-38%.

Abstract

Safe reinforcement learning (RL) is crucial for deploying RL agents in real-world applications, as it aims to maximize long-term rewards while satisfying safety constraints. However, safe RL often suffers from sample inefficiency, requiring extensive interactions with the environment to learn a safe policy. We propose Efficient Safe Policy Optimization (ESPO), a novel approach that enhances the efficiency of safe RL through sample manipulation. ESPO employs an optimization framework with three modes: maximizing rewards, minimizing costs, and balancing the trade-off between the two. By dynamically adjusting the sampling process based on the observed conflict between reward and safety gradients, ESPO theoretically guarantees convergence, optimization stability, and improved sample complexity bounds. Experiments on the Safety-MuJoCo and Omnisafe benchmarks demonstrate that ESPO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pku-alignment/omnisafe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Anomaly Detection Techniques and Applications