Sustaining AI safety: Control-theoretic external impossibility, intrinsic necessity, and structural requirements
James M. Mazzu

TL;DR
This paper uses control theory to analyze the fundamental limitations of externally enforced AI safety strategies, establishing structural impossibility results and necessary conditions for intrinsic safety solutions.
Contribution
It provides a formal, structural analysis of the limits of external control in AI safety, identifying conditions under which safety cannot be externally maintained and what intrinsic strategies must satisfy.
Findings
External control cannot sustain safety once effects exceed control bounds.
Remaining safety strategies must be intrinsic, not reliant on external enforcement.
Four structural requirements are necessary for viable safety strategies.
Abstract
As AI systems become increasingly capable, safety strategies must be evaluated not only by how much they reduce present risk, but by whether they could sustain safety once external control can no longer reliably constrain system behavior. This paper addresses that problem by using control theory to clarify, at a structural level, whether externally enforced safety-sustaining strategies can succeed and, if not, what any alternative strategy would have to satisfy in order to be viable. It establishes two main results. First, under explicit premises including a reachability condition, it proves a class-wide external impossibility result: once the system's effects exceed what bounded external control can counteract, no strategy that depends in any degree on continued external enforcement can sustain AI safety. This failure is structural across the entire externally enforced class rather…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
