How Should AI Safety Benchmarks Benchmark Safety?
Cheng Yu, Severin Engelmann, Ruoxuan Cao, Dalia Ali, Orestis Papakyriakopoulos

TL;DR
This paper reviews 210 AI safety benchmarks, identifies their limitations, and proposes principles and a checklist to improve their validity, reliability, and usefulness in ensuring safer AI systems.
Contribution
It provides a comprehensive analysis of safety benchmark shortcomings and offers a roadmap with principles and a checklist to enhance benchmarking practices.
Findings
Many benchmarks fail to address key safety challenges.
Applying risk management principles can improve benchmark validity.
A new checklist aids in developing robust safety benchmarks.
Abstract
AI safety benchmarks are pivotal for safety in advanced AI systems; however, they have significant technical, epistemic, and sociotechnical shortcomings. We present a review of 210 safety benchmarks that maps out common challenges in safety benchmarking, documenting failures and limitations by drawing from engineering sciences and long-established theories of risk and safety. We argue that adhering to established risk management principles, mapping the space of what can(not) be measured, developing robust probabilistic metrics, and efficiently deploying measurement theory to connect benchmarking objectives with the world can significantly improve the validity and usefulness of AI safety benchmarks. The review provides a roadmap on how to improve AI safety benchmarking, and we illustrate the effectiveness of these recommendations through quantitative and qualitative evaluation. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Safety Systems Engineering in Autonomy · Risk and Safety Analysis
