RL-STPA: Adapting System-Theoretic Hazard Analysis for Safety-Critical Reinforcement Learning
Steven A. Senczyszyn, Timothy C. Havens, Nathaniel Rice, Jason E. Summers, Benjamin D. Werner, Benjamin J. Schumeg

TL;DR
RL-STPA is a novel hazard analysis framework that adapts traditional system-theoretic methods to identify safety issues in reinforcement learning for safety-critical systems like autonomous drones.
Contribution
It introduces hierarchical decomposition, coverage-guided testing, and iterative hazard feedback to improve safety evaluation of RL policies.
Findings
Revealed potential loss scenarios in autonomous drone navigation.
Provided quantitative safety coverage metrics.
Demonstrated hazard identification beyond standard RL evaluations.
Abstract
As reinforcement learning (RL) deployments expand into safety-critical domains, existing evaluation methods fail to systematically identify hazards arising from the black-box nature of neural network enabled policies and distributional shift between training and deployment. This paper introduces Reinforcement Learning System-Theoretic Process Analysis (RL-STPA), a framework that adapts conventional STPA's systematic hazard analysis to address RL's unique challenges through three key contributions: hierarchical subtask decomposition using both temporal phase analysis and domain expertise to capture emergent behaviors, coverage-guided perturbation testing that explores the sensitivity of state-action spaces, and iterative checkpoints that feed identified hazards back into training through reward shaping and curriculum design. We demonstrate RL-STPA in the safety-critical test case of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
