Criticality and Safety Margins for Reinforcement Learning
Alexander Grushin, Walt Woods, Alvaro Velasquez, Simon Khan

TL;DR
This paper introduces a framework for quantifying the criticality of decisions in reinforcement learning, enabling safer deployment by identifying potentially unsafe situations through interpretable safety margins and ground-truth metrics.
Contribution
It defines true and proxy criticality metrics with a clear significance to users, and demonstrates their effectiveness in predicting unsafe situations in RL agents.
Findings
Proxy criticality correlates monotonically with true criticality.
Safety margins can identify high-risk decision points.
Supervising 5% of decisions could prevent nearly half of agent errors.
Abstract
State of the art reinforcement learning methods sometimes encounter unsafe situations. Identifying when these situations occur is of interest both for post-hoc analysis and during deployment, where it might be advantageous to call out to a human overseer for help. Efforts to gauge the criticality of different points in time have been developed, but their accuracy is not well established due to a lack of ground truth, and they are not designed to be easily interpretable by end users. Therefore, we seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users. We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions. We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality. Safety…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsConvolution · Softmax · Dense Connections · Entropy Regularization · A3C
