Expanding External Access To Frontier AI Models For Dangerous Capability Evaluations
Jacob Charnock, Alejandro Tlaie, Kyle O'Brien, Stephen Casper, Aidan Homewood

TL;DR
This paper introduces a taxonomy and three access levels for evaluating dangerous capabilities of frontier AI models, aiming to improve assessment rigour, stakeholder trust, and policy clarity while addressing security concerns.
Contribution
It proposes a structured framework for different evaluator access levels, clarifying benefits, risks, and mitigation strategies for each, to enhance safety assessments of AI models.
Findings
Defined three access levels: AL1, AL2, AL3.
Analyzed benefits and risks of expanding access.
Suggested safeguards to mitigate security and capacity challenges.
Abstract
Frontier AI companies increasingly rely on external evaluations to assess risks from dangerous capabilities before deployment. However, external evaluators often receive limited model access, limited information, and little time, which can reduce evaluation rigour and confidence. The EU General-Purpose AI Code of Practice calls for "appropriate access", but does not specify what this means in practice. Furthermore, there is no common framework for describing different types and levels of evaluator access. To address this gap, we propose a taxonomy of access methods for dangerous capability evaluations. We disentangle three aspects of access: model access, model information, and evaluation timeframe. For each aspect, we review benefits and risks, including how expanding access can reduce false negatives and improve stakeholder trust, but can also increase security and capacity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education
