Out of Control -- Why Alignment Needs Formal Control Theory (and an Alignment Control Stack)
Elija Perrier

TL;DR
This paper advocates for integrating formal optimal control theory into AI alignment research, proposing a hierarchical control stack to improve understanding, interoperability, and safety of advanced AI systems.
Contribution
It introduces the Alignment Control Stack, a hierarchical framework combining control theory with AI alignment to enhance safety and interoperability.
Findings
Proposes a hierarchical Alignment Control Stack for AI systems.
Highlights the importance of formal control methods for safety.
Suggests control theory can improve alignment robustness.
Abstract
This position paper argues that formal optimal control theory should be central to AI alignment research, offering a distinct perspective from prevailing AI safety and security approaches. While recent work in AI safety and mechanistic interpretability has advanced formal methods for alignment, they often fall short of the generalisation required of control frameworks for other technologies. There is also a lack of research into how to render different alignment/control protocols interoperable. We argue that by recasting alignment through principles of formal optimal control and framing alignment in terms of hierarchical stack from physical to socio-technical layers according to which controls may be applied we can develop a better understanding of the potential and limitations for controlling frontier models and agentic AI systems. To this end, we introduce an Alignment Control Stack…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
