Clear, Compelling Arguments: Rethinking the Foundations of Frontier AI Safety Cases
Shaun Feakins, Ibrahim Habli, Phillip Morgan

TL;DR
This paper critically examines current safety case approaches for frontier AI, proposing a rethinking based on safety assurance principles to develop more robust, defensible safety frameworks for AI deployment.
Contribution
It offers a comprehensive critique of existing alignment safety cases and introduces a new foundational framework inspired by safety assurance methodologies.
Findings
Current alignment safety cases have significant limitations.
Safety assurance principles can improve AI safety case robustness.
A case study on Deceptive Alignment illustrates the proposed approach.
Abstract
This paper contributes to the nascent debate around safety cases for frontier AI systems. Safety cases are structured, defensible arguments that a system is acceptably safe to deploy in a given context. Historically, they have been used in safety-critical industries, such as aerospace, nuclear or automotive. As a result, safety cases for frontier AI have risen in prominence, both in the safety policies of leading frontier developers and in international research agendas proposed by leaders in generative AI, such as the Singapore Consensus on Global AI Safety Research Priorities and the International AI Safety Report. This paper appraises this work. We note that research conducted within the alignment community which draws explicitly on lessons from the assurance community has significant limitations. We therefore aim to rethink existing approaches to alignment safety cases. We offer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSafety Systems Engineering in Autonomy · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
