# Efficient Exploration Using Extra Safety Budget in Constrained Policy   Optimization

**Authors:** Haotian Xu, Shengjie Wang, Zhaolei Wang, Yunzhe Zhang and, Qing Zhuo, Yang Gao, Tao Zhang

arXiv: 2302.14339 · 2023-07-31

## TL;DR

This paper introduces ESB-CPO, a reinforcement learning algorithm that improves exploration efficiency and safety in robotic control by gradually tightening safety constraints during training.

## Contribution

The paper proposes a novel method that uses an extra safety budget in early training to enhance exploration while ensuring eventual constraint satisfaction.

## Key findings

- Outperforms baseline algorithms in safety and optimality on Safety-Gym benchmarks.
- Achieves significant performance improvements under the same cost constraints.
- Demonstrates theoretical guarantees of gradually meeting safety constraints.

## Abstract

Reinforcement learning (RL) has achieved promising results on most robotic control tasks. Safety of learning-based controllers is an essential notion of ensuring the effectiveness of the controllers. Current methods adopt whole consistency constraints during the training, thus resulting in inefficient exploration in the early stage. In this paper, we propose an algorithm named Constrained Policy Optimization with Extra Safety Budget (ESB-CPO) to strike a balance between the exploration efficiency and the constraints satisfaction. In the early stage, our method loosens the practical constraints of unsafe transitions (adding extra safety budget) with the aid of a new metric we propose. With the training process, the constraints in our optimization problem become tighter. Meanwhile, theoretical analysis and practical experiments demonstrate that our method gradually meets the cost limit's demand in the final training stage. When evaluated on Safety-Gym and Bullet-Safety-Gym benchmarks, our method has shown its advantages over baseline algorithms in terms of safety and optimality. Remarkably, our method gains remarkable performance improvement under the same cost limit compared with baselines.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.14339/full.md

## Figures

32 figures with captions in the complete paper: https://tomesphere.com/paper/2302.14339/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/2302.14339/full.md

---
Source: https://tomesphere.com/paper/2302.14339