On the Formal Limits of Alignment Verification

Ayushi Agarwal

arXiv:2603.08761·stat.ML·March 11, 2026

On the Formal Limits of Alignment Verification

Ayushi Agarwal

PDF

Open Access

TL;DR

This paper demonstrates fundamental theoretical limits on the possibility of formally certifying AI alignment, showing that no verification method can be simultaneously sound, general, and tractable.

Contribution

It proves a trilemma establishing that no verification procedure can satisfy all three properties simultaneously, clarifying the inherent limitations of alignment certification.

Findings

01

No verification method can be fully sound, general, and tractable simultaneously.

02

Relaxing any property allows for practical verification approaches.

03

The limits are due to computational complexity, goal non-identifiability, and finite evidence constraints.

Abstract

The goal of AI alignment is to ensure that an AI system reliably pursues intended objectives. A foundational question for AI safety is whether alignment can be formally certified: whether there exists a procedure that can guarantee that a given system satisfies an alignment specification. This paper studies the nature of alignment verification. We prove that no verification procedure can simultaneously satisfy three properties: soundness (no misaligned system is certified), generality (verification holds over the full input domain), and tractability (verification runs in polynomial time). Each pair of properties is achievable, but all three cannot hold simultaneously. Relaxing any one property restores the corresponding possibility, indicating that practical bounded or probabilistic assurance remains viable. The result follows from three independent barriers: the computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)