Incentive-Aware AI Safety via Strategic Resource Allocation: A Stackelberg Security Games Perspective

Cheol Woo Kim; Davin Choo; Tzeh Yuan Neoh; Milind Tambe

arXiv:2602.07259·cs.AI·February 10, 2026

Incentive-Aware AI Safety via Strategic Resource Allocation: A Stackelberg Security Games Perspective

Cheol Woo Kim, Davin Choo, Tzeh Yuan Neoh, Milind Tambe

PDF

Open Access

TL;DR

This paper introduces a game-theoretic framework based on Stackelberg Security Games to improve AI safety by strategically allocating oversight resources and accounting for adversarial incentives throughout the AI lifecycle.

Contribution

It applies Stackelberg Security Games to model AI safety oversight, integrating incentive design and adversarial considerations into a unified strategic framework.

Findings

01

Framework informs training-time auditing against poisoning attacks

02

Guides resource-constrained pre-deployment evaluation

03

Supports robust multi-model deployment in adversarial settings

Abstract

As AI systems grow more capable and autonomous, ensuring their safety and reliability requires not only model-level alignment but also strategic oversight of the humans and institutions involved in their development and deployment. Existing safety frameworks largely treat alignment as a static optimization problem (e.g., tuning models to desired behavior) while overlooking the dynamic, adversarial incentives that shape how data are collected, how models are evaluated, and how they are ultimately deployed. We propose a new perspective on AI safety grounded in Stackelberg Security Games (SSGs): a class of game-theoretic models designed for adversarial resource allocation under uncertainty. By viewing AI oversight as a strategic interaction between defenders (auditors, evaluators, and deployers) and attackers (malicious actors, misaligned contributors, or worst-case failure modes), SSGs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Information and Cyber Security · Smart Grid Security and Resilience