Formalizing Embeddedness Failures in Universal Artificial Intelligence

Cole Wyeth; Marcus Hutter

arXiv:2505.17882·cs.AI·May 26, 2025

Formalizing Embeddedness Failures in Universal Artificial Intelligence

Cole Wyeth, Marcus Hutter

PDF

TL;DR

This paper rigorously formalizes and proves the occurrence of embeddedness failure modes in the AIXI universal artificial intelligence model, highlighting challenges in modeling embedded agency.

Contribution

It provides a formal proof of embeddedness failures in AIXI and evaluates progress towards a comprehensive theory of embedded agency.

Findings

01

Embeddedness failures occur within the AIXI framework.

02

Formal proofs of failure modes are established.

03

Progress towards a theory of embedded agency is assessed.

Abstract

We rigorously discuss the commonly asserted failures of the AIXI reinforcement learning agent as a model of embedded agency. We attempt to formalize these failure modes and prove that they occur within the framework of universal artificial intelligence, focusing on a variant of AIXI that models the joint action/percept history as drawn from the universal distribution. We also evaluate the progress that has been made towards a successful theory of embedded agency based on variants of the AIXI agent.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.