Criticality and Safety Margins for Reinforcement Learning

Alexander Grushin; Walt Woods; Alvaro Velasquez; Simon Khan

arXiv:2409.18289·cs.LG·May 29, 2025

Criticality and Safety Margins for Reinforcement Learning

Alexander Grushin, Walt Woods, Alvaro Velasquez, Simon Khan

PDF

Open Access

TL;DR

This paper introduces a framework for quantifying the criticality of decisions in reinforcement learning, enabling safer deployment by identifying potentially unsafe situations through interpretable safety margins and ground-truth metrics.

Contribution

It defines true and proxy criticality metrics with a clear significance to users, and demonstrates their effectiveness in predicting unsafe situations in RL agents.

Findings

01

Proxy criticality correlates monotonically with true criticality.

02

Safety margins can identify high-risk decision points.

03

Supervising 5% of decisions could prevent nearly half of agent errors.

Abstract

State of the art reinforcement learning methods sometimes encounter unsafe situations. Identifying when these situations occur is of interest both for post-hoc analysis and during deployment, where it might be advantageous to call out to a human overseer for help. Efforts to gauge the criticality of different points in time have been developed, but their accuracy is not well established due to a lack of ground truth, and they are not designed to be easily interpretable by end users. Therefore, we seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users. We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions. We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality. Safety…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsConvolution · Softmax · Dense Connections · Entropy Regularization · A3C