What AI evaluations for preventing catastrophic risks can and cannot do

Peter Barnett; Lisa Thiergart

arXiv:2412.08653·cs.CY·December 13, 2024

What AI evaluations for preventing catastrophic risks can and cannot do

Peter Barnett, Lisa Thiergart

PDF

Open Access

TL;DR

AI evaluations are useful for certain safety assessments but have fundamental limitations, such as inability to predict future capabilities or fully assess autonomous AI risks, requiring supplementary safety measures.

Contribution

The paper critically analyzes the capabilities and limitations of current AI evaluation methods in preventing catastrophic risks, highlighting fundamental constraints.

Findings

01

Evaluations can establish lower bounds on AI capabilities.

02

Evaluations can assess certain misuse risks.

03

Fundamental limitations prevent establishing upper bounds or reliably forecasting future capabilities.

Abstract

AI evaluations are an important component of the AI governance toolkit, underlying current approaches to safety cases for preventing catastrophic risks. Our paper examines what these evaluations can and cannot tell us. Evaluations can establish lower bounds on AI capabilities and assess certain misuse risks given sufficient effort from evaluators. Unfortunately, evaluations face fundamental limitations that cannot be overcome within the current paradigm. These include an inability to establish upper bounds on capabilities, reliably forecast future model capabilities, or robustly assess risks from autonomous AI systems. This means that while evaluations are valuable tools, we should not rely on them as our main way of ensuring AI systems are safe. We conclude with recommendations for incremental improvements to frontier AI safety, while acknowledging these fundamental limitations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Risk Perception and Management