Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive
Radha Sarma

TL;DR
This paper proves that optimization-based AI systems like RLHF-trained models cannot be genuinely normative agents due to their architectural constraints, leading to predictable failure modes and risks in high-stakes applications.
Contribution
It introduces a formal proof of the inherent incompatibility between optimization-based architectures and normative agency, and proposes architectural conditions for genuine agency.
Findings
RLHF systems cannot satisfy the conditions for genuine agency.
Optimization operations inherently preclude normative governance.
Documented failure modes are structural, not accidental.
Abstract
AI systems are increasingly deployed in high-stakes contexts (medical diagnosis, legal research, financial analysis) under the assumption they can be governed by norms. This paper demonstrates that the assumption is formally invalid for optimization-based systems, specifically Large Language Models trained via Reinforcement Learning from Human Feedback (RLHF). Genuine agency requires two necessary and jointly sufficient architectural conditions. First, the capacity to maintain certain boundaries as non-negotiable constraints rather than tradeable weights (Incommensurability). Second, a non-inferential mechanism capable of suspending processing when those boundaries are threatened (Apophatic Responsiveness). RLHF-based systems are constitutively incompatible with both conditions. The operations that make optimization powerful, unifying all values on a scalar metric and always selecting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Embodied and Extended Cognition · Explainable Artificial Intelligence (XAI)
