Safety case template for frontier AI: A cyber inability argument

Arthur Goemans; Marie Davidsen Buhl; Jonas Schuett; Tomek; Korbak; Jessica Wang; Benjamin Hilton; Geoffrey Irving

arXiv:2411.08088·cs.CY·November 14, 2024·2 cites

Safety case template for frontier AI: A cyber inability argument

Arthur Goemans, Marie Davidsen Buhl, Jonas Schuett, Tomek, Korbak, Jessica Wang, Benjamin Hilton, Geoffrey Irving

PDF

Open Access

TL;DR

This paper introduces a structured safety case template for frontier AI systems, focusing on cyber risks, using the Claims Arguments Evidence framework to make safety arguments explicit and coherent.

Contribution

It presents a novel safety case template for AI safety assurance, integrating risk models, proxy tasks, and evaluation results within the CAE framework.

Findings

01

Template demonstrates how to structure safety arguments clearly.

02

Connects risk models with evaluation results systematically.

03

Serves as a proof of concept to foster AI safety discussions.

Abstract

Frontier artificial intelligence (AI) systems pose increasing risks to society, making it essential for developers to provide assurances about their safety. One approach to offering such assurances is through a safety case: a structured, evidence-based argument aimed at demonstrating why the risk associated with a safety-critical system is acceptable. In this article, we propose a safety case template for offensive cyber capabilities. We illustrate how developers could argue that a model does not have capabilities posing unacceptable cyber risks by breaking down the main claim into progressively specific sub-claims, each supported by evidence. In our template, we identify a number of risk models, derive proxy tasks from the risk models, define evaluation settings for the proxy tasks, and connect those with evaluation results. Elements of current frontier safety techniques - such as risk…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Safety Systems Engineering in Autonomy · Smart Grid Security and Resilience