Extending the Formalism and Theoretical Foundations of Cryptography to AI

Federico Villa; F. Bet\"ul Durak; Tadayoshi Kohno; Tapdig Maharramli; Franziska Roesner

arXiv:2603.02590·cs.CR·March 4, 2026

Extending the Formalism and Theoretical Foundations of Cryptography to AI

Federico Villa, F. Bet\"ul Durak, Tadayoshi Kohno, Tapdig Maharramli, Franziska Roesner

PDF

Open Access

TL;DR

This paper develops a formal foundation for understanding and analyzing the security of AI agents, especially language models, by creating a unified framework that captures various security aspects and enables principled system design.

Contribution

It introduces a formal security framework for AI agents, including an attack taxonomy, a security game model, and a modular approach to security objectives, advancing the theoretical understanding of AI system security.

Findings

01

Existing confidentiality approaches conflict with system completeness.

02

A modular decomposition of helpfulness and harmlessness is sound.

03

Formal security reductions are necessary for principled AI system design.

Abstract

Recent progress in (Large) Language Models (LMs) has enabled the development of autonomous LM-based agents capable of executing complex tasks with minimal supervision. These agents have started to be integrated into systems with significant autonomy and authority. The security community has been studying their security. One emerging direction to mitigate security risks is to constrain agent behaviours via access control and permissioning mechanisms. Existing permissioning proposals, however, remain difficult to compare due to the absence of a shared formal foundation. This work provides such a foundation. We first systematize the landscape by constructing an attack taxonomy tailored to language models, the computational primitives of agentic systems. We then develop a formal treatment of agentic access control by defining an AIOracle algorithmically and introducing a security-game…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Cryptography and Data Security · Access Control and Trust