How Should AI Safety Benchmarks Benchmark Safety?

Cheng Yu; Severin Engelmann; Ruoxuan Cao; Dalia Ali; Orestis Papakyriakopoulos

arXiv:2601.23112·cs.CY·February 10, 2026

How Should AI Safety Benchmarks Benchmark Safety?

Cheng Yu, Severin Engelmann, Ruoxuan Cao, Dalia Ali, Orestis Papakyriakopoulos

PDF

Open Access

TL;DR

This paper reviews 210 AI safety benchmarks, identifies their limitations, and proposes principles and a checklist to improve their validity, reliability, and usefulness in ensuring safer AI systems.

Contribution

It provides a comprehensive analysis of safety benchmark shortcomings and offers a roadmap with principles and a checklist to enhance benchmarking practices.

Findings

01

Many benchmarks fail to address key safety challenges.

02

Applying risk management principles can improve benchmark validity.

03

A new checklist aids in developing robust safety benchmarks.

Abstract

AI safety benchmarks are pivotal for safety in advanced AI systems; however, they have significant technical, epistemic, and sociotechnical shortcomings. We present a review of 210 safety benchmarks that maps out common challenges in safety benchmarking, documenting failures and limitations by drawing from engineering sciences and long-established theories of risk and safety. We argue that adhering to established risk management principles, mapping the space of what can(not) be measured, developing robust probabilistic metrics, and efficiently deploying measurement theory to connect benchmarking objectives with the world can significantly improve the validity and usefulness of AI safety benchmarks. The review provides a roadmap on how to improve AI safety benchmarking, and we illustrate the effectiveness of these recommendations through quantitative and qualitative evaluation. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Safety Systems Engineering in Autonomy · Risk and Safety Analysis