Revisiting Safe Exploration in Safe Reinforcement learning

David Eckel; Baohe Zhang; Joschka B\"odecker

arXiv:2409.01245·cs.LG·September 4, 2024

Revisiting Safe Exploration in Safe Reinforcement learning

David Eckel, Baohe Zhang, Joschka B\"odecker

PDF

Open Access

TL;DR

This paper introduces the expected maximum consecutive cost steps (EMCC) metric for safer exploration in reinforcement learning, emphasizing the severity and duration of unsafe behaviors to improve safety during training.

Contribution

The paper proposes a novel safety metric, EMCC, that better captures the severity of unsafe events and applies it to evaluate and benchmark safe exploration algorithms.

Findings

01

EMCC effectively distinguishes between prolonged and occasional safety violations.

02

Applying EMCC improves the evaluation of safe exploration algorithms.

03

A new lightweight benchmark task is proposed for fast safety assessment.

Abstract

Safe reinforcement learning (SafeRL) extends standard reinforcement learning with the idea of safety, where safety is typically defined through the constraint of the expected cost return of a trajectory being below a set limit. However, this metric fails to distinguish how costs accrue, treating infrequent severe cost events as equal to frequent mild ones, which can lead to riskier behaviors and result in unsafe exploration. We introduce a new metric, expected maximum consecutive cost steps (EMCC), which addresses safety during training by assessing the severity of unsafe steps based on their consecutive occurrence. This metric is particularly effective for distinguishing between prolonged and occasional safety violations. We apply EMMC in both on- and off-policy algorithm for benchmarking their safe exploration capability. Finally, we validate our metric through a set of benchmarks and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications

MethodsSparse Evolutionary Training