From Admission to Invariants: Measuring Deviation in Delegated Agent Systems
Marcelo Fernandez (TraslaIA)

TL;DR
This paper demonstrates the fundamental limitations of enforcement mechanisms in detecting behavioral drift in autonomous agents and introduces the Invariant Measurement Layer (IML) to overcome these limitations.
Contribution
The paper proves a theoretical impossibility for enforcement-based monitoring and proposes IML as a novel solution to detect behavioral drift at admission time.
Findings
Enforcement signals cannot fully observe behavioral deviations due to structural limitations.
IML can detect behavioral drift within 9-258 steps, outperforming enforcement triggers.
Theoretical proof of non-identifiability of admission-time behavior space under local observability.
Abstract
Autonomous agent systems are governed by enforcement mechanisms that flag hard constraint violations at runtime. The Agent Control Protocol identifies a structural limit of such systems: a correctly-functioning enforcement engine can enter a regime in which behavioral drift is invisible to it, because the enforcement signal operates below the layer where deviation is measurable. We show that enforcement-based governance is structurally unable to determine whether an agent behavior remains within the admissible behavior space A0 established at admission time. Our central result, the Non-Identifiability Theorem, proves that A0 is not in the sigma-algebra generated by the enforcement signal g under the Local Observability Assumption, which every practical enforcement system satisfies. The impossibility arises from a fundamental mismatch: g evaluates actions locally against a point-wise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
