The Alignment Trap: Complexity Barriers

Jasper Yao

arXiv:2506.10304·cs.AI·June 26, 2025

The Alignment Trap: Complexity Barriers

Jasper Yao

PDF

Open Access

TL;DR

This paper reveals fundamental logical and mathematical barriers to AI safety, showing that achieving safe and highly capable AI faces insurmountable theoretical impossibilities across multiple domains.

Contribution

It introduces five independent impossibility proofs demonstrating core barriers to AI alignment, fundamentally challenging the feasibility of safe, highly capable AI.

Findings

01

Safe policies have measure zero in model space

02

Verifying safety is coNP-complete

03

Training data for safety is logically unobtainable

Abstract

This paper argues that AI alignment is not merely difficult, but is founded on a fundamental logical contradiction. We first establish The Enumeration Paradox: we use machine learning precisely because we cannot enumerate all necessary safety rules, yet making ML safe requires examples that can only be generated from the very enumeration we admit is impossible. This paradox is then confirmed by a set of five independent mathematical proofs, or "pillars of impossibility." Our main results show that: (1) Geometric Impossibility: The set of safe policies has measure zero, a necessary consequence of projecting infinite-dimensional world-context requirements onto finite-dimensional models. (2) Computational Impossibility: Verifying a policy's safety is coNP-complete, even for non-zero error tolerances. (3) Statistical Impossibility: The training data required for safety (abundant examples of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)

MethodsSparse Evolutionary Training