Extending the Formalism and Theoretical Foundations of Cryptography to AI
Federico Villa, F. Bet\"ul Durak, Tadayoshi Kohno, Tapdig Maharramli, Franziska Roesner

TL;DR
This paper develops a formal foundation for understanding and analyzing the security of AI agents, especially language models, by creating a unified framework that captures various security aspects and enables principled system design.
Contribution
It introduces a formal security framework for AI agents, including an attack taxonomy, a security game model, and a modular approach to security objectives, advancing the theoretical understanding of AI system security.
Findings
Existing confidentiality approaches conflict with system completeness.
A modular decomposition of helpfulness and harmlessness is sound.
Formal security reductions are necessary for principled AI system design.
Abstract
Recent progress in (Large) Language Models (LMs) has enabled the development of autonomous LM-based agents capable of executing complex tasks with minimal supervision. These agents have started to be integrated into systems with significant autonomy and authority. The security community has been studying their security. One emerging direction to mitigate security risks is to constrain agent behaviours via access control and permissioning mechanisms. Existing permissioning proposals, however, remain difficult to compare due to the absence of a shared formal foundation. This work provides such a foundation. We first systematize the landscape by constructing an attack taxonomy tailored to language models, the computational primitives of agentic systems. We then develop a formal treatment of agentic access control by defining an AIOracle algorithmically and introducing a security-game…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Cryptography and Data Security · Access Control and Trust
