Alignment Contracts for Agentic Security Systems
Isaac David, Marco Guarnieri, Arthur Gervais

TL;DR
This paper introduces alignment contracts as a formal framework to specify and enforce behavioral constraints on agentic security systems, ensuring controlled effects within defined boundaries.
Contribution
It formalizes alignment contracts with semantics, satisfaction criteria, and composition rules, enabling modular and decidable enforcement of behavioral constraints in security workflows.
Findings
Framework supports web-focused agentic security workflows.
Formal semantics and safety properties are established for effect traces.
Enforcement guarantees are provided under an explicit effect observability assumption.
Abstract
Agentic security systems increasingly combine LLM planners with tools that can discover, validate, and report vulnerabilities. This creates an asymmetric control problem: the system should retain strong offensive capability inside an authorized engagement, while the same capabilities must be denied outside scope. Existing guardrails provide useful policy controls, but they do not make this boundary a first-class formal contract over observable effects. We introduce alignment contracts, a framework for specifying and enforcing behavioral constraints over observable effect traces. A contract defines scope, allowed and forbidden effects, resource budgets, and disclosure policies. We give the language finite-trace semantics, characterize satisfaction as a safety property with finite violation witnesses, develop refinement and one-way composition rules for modular contract engineering, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
