Intrinsic Barriers and Practical Pathways for Human-AI Alignment: An Agreement-Based Complexity Analysis

Aran Nayebi

arXiv:2502.05934·cs.AI·November 20, 2025

Intrinsic Barriers and Practical Pathways for Human-AI Alignment: An Agreement-Based Complexity Analysis

Aran Nayebi

PDF

Open Access 1 Video

TL;DR

This paper formalizes the inherent complexity limits of aligning AI systems with human values, demonstrating fundamental barriers and proposing principles for scalable human-AI collaboration.

Contribution

It introduces a formal agreement-based framework for AI alignment, proving intrinsic complexity bounds and constructing algorithms that highlight the inevitability of reward hacking.

Findings

01

Intrinsic alignment overheads grow with number of objectives or agents.

02

Encoding all human values is inherently intractable.

03

Reward hacking is unavoidable in large task spaces with finite samples.

Abstract

We formalize AI alignment as a multi-objective optimization problem called $⟨ M, N, ε, δ ⟩$ -agreement, in which a set of $N$ agents (including humans) must reach approximate ( $ε$ ) agreement across $M$ candidate objectives, with probability at least $1 - δ$ . Analyzing communication complexity, we prove an information-theoretic lower bound showing that once either $M$ or $N$ is large enough, no amount of computational power or rationality can avoid intrinsic alignment overheads. This establishes rigorous limits to alignment *itself*, not merely to particular methods, clarifying a "No-Free-Lunch" principle: encoding "all human values" is inherently intractable and must be managed through consensus-driven reduction or prioritization of objectives. Complementing this impossibility result, we construct explicit algorithms as achievability certificates for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Intrinsic Barriers and Practical Pathways for Human–AI Alignment: An Agreement-Based Complexity Analysis· underline

Taxonomy

TopicsInnovation Diffusion and Forecasting

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · ALIGN