Who Should Run Advanced AI Evaluations -- AISIs?
Merlin Stein, Milan Gandhi, Theresa Kriecherbauer, Amin Oueslati, Robert Trager

TL;DR
This paper analyzes how public and private entities should share responsibilities in evaluating advanced AI safety, emphasizing the importance of public involvement in safety-critical assessments and the need for substantial capacity for effective oversight.
Contribution
It provides a nuanced framework for distributing AI evaluation responsibilities between public and private sectors based on industry context and risk, and estimates necessary public evaluation capacity.
Findings
Public bodies should handle safety-critical evaluations, especially gray- and white-box models.
Private evaluators can efficiently conduct governance and black-box assessments under public oversight.
Public evaluation capacity must scale with industry risk, potentially requiring hundreds of staff in large jurisdictions.
Abstract
Artificial Intelligence (AI) Safety Institutes and governments worldwide are deciding whether they evaluate advanced AI themselves, support a private evaluation ecosystem or do both. Evaluation regimes have been established in a wide range of industry contexts to monitor and evaluate firms' compliance with regulation. Evaluation is a necessary governance tool to understand and manage the risks of a technology. This paper draws from nine such regimes to inform (i) who should evaluate which parts of advanced AI; and (ii) how much capacity public bodies may need to evaluate advanced AI effectively. First, the effective responsibility distribution between public and private evaluators depends heavily on specific industry and evaluation conditions. On the basis of advanced AI's risk profile, the sensitivity of information involved in the evaluation process, and the high costs of verifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic and Technological Developments in Russia · Economic Development and Digital Transformation
