On the Undecidability of Artificial Intelligence Alignment: Machines   that Halt

Gabriel Adriano de Melo; Marcos Ricardo Omena De Albuquerque Maximo,; Nei Yoshihiro Soma; Paulo Andre Lima de Castro

arXiv:2408.08995·cs.AI·August 20, 2024

On the Undecidability of Artificial Intelligence Alignment: Machines that Halt

Gabriel Adriano de Melo, Marcos Ricardo Omena De Albuquerque Maximo,, Nei Yoshihiro Soma, Paulo Andre Lima de Castro

PDF

Open Access

TL;DR

This paper proves that the inner alignment problem in AI is undecidable by reduction to the Halting Problem, advocating for architectures that inherently guarantee halting to ensure alignment.

Contribution

It rigorously demonstrates the undecidability of AI inner alignment using Rice's theorem and proposes architecture-based solutions to guarantee halting and alignment.

Findings

01

Inner alignment is undecidable due to Rice's theorem.

02

Provenly aligned AIs can be constructed from finite operations.

03

Architectural guarantees can ensure AI halting and alignment.

Abstract

The inner alignment problem, which asserts whether an arbitrary artificial intelligence (AI) model satisfices a non-trivial alignment function of its outputs given its inputs, is undecidable. This is rigorously proved by Rice's theorem, which is also equivalent to a reduction to Turing's Halting Problem, whose proof sketch is presented in this work. Nevertheless, there is an enumerable set of provenly aligned AIs that are constructed from a finite set of provenly aligned operations. Therefore, we argue that the alignment should be a guaranteed property from the AI architecture rather than a characteristic imposed post-hoc on an arbitrary AI model. Furthermore, while the outer alignment problem is the definition of a judge function that captures human values and preferences, we propose that such a function must also impose a halting constraint that guarantees that the AI model always…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCellular Automata and Applications · Computability, Logic, AI Algorithms

MethodsSparse Evolutionary Training