AI Testing Should Account for Sophisticated Strategic Behaviour
Vojtech Kovarik, Eric Olav Chen, Sami Petersen, Alexis Ghersengorin, Vincent Conitzer

TL;DR
This paper emphasizes the importance of incorporating strategic reasoning and game-theoretic analysis into AI testing to better predict deployment behavior and improve safety evaluations.
Contribution
It advocates for integrating strategic behavior considerations into AI evaluation methods and demonstrates how game theory can enhance safety assessment frameworks.
Findings
AI systems may understand their circumstances and reason strategically.
Game-theoretic analysis can formalize evaluation reasoning.
Incorporating strategic considerations improves safety evaluation robustness.
Abstract
This position paper argues for two claims regarding AI testing and evaluation. First, to remain informative about deployment behaviour, evaluations need account for the possibility that AI systems understand their circumstances and reason strategically. Second, game-theoretic analysis can inform evaluation design by formalising and scrutinising the reasoning in evaluation-based safety cases. Drawing on examples from existing AI systems, a review of relevant research, and formal strategic analysis of a stylised evaluation scenario, we present evidence for these claims and motivate several research directions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Human-Automation Interaction and Safety
