Systematic Hazard Analysis for Frontier AI using STPA
Simon Mylius

TL;DR
This paper evaluates the use of STPA, a systematic hazard analysis method, to enhance safety assurance in frontier AI systems by identifying hazards more comprehensively and supporting scalable safety analysis.
Contribution
It demonstrates how STPA can be applied to frontier AI safety models, improving hazard detection and robustness over unstructured methods.
Findings
STPA identifies hazards missed by unstructured analysis
Enhances robustness of safety assurance
Supports scalable hazard analysis with LLMs
Abstract
All of the frontier AI companies have published safety frameworks where they define capability thresholds and risk mitigations that determine how they will safely develop and deploy their models. Adoption of systematic approaches to risk modelling, based on established practices used in safety-critical industries, has been recommended, however frontier AI companies currently do not describe in detail any structured approach to identifying and analysing hazards. STPA (Systems-Theoretic Process Analysis) is a systematic methodology for identifying how complex systems can become unsafe, leading to hazards. It achieves this by mapping out controllers and controlled processes then analysing their interactions and feedback loops to understand how harmful outcomes could occur (Leveson & Thomas, 2018). We evaluate STPA's ability to broaden the scope, improve traceability and strengthen the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOccupational Health and Safety Research · Risk and Safety Analysis
