Intelligence and Unambitiousness Using Algorithmic Information Theory
Michael K. Cohen, Badri Vellambi, Marcus Hutter

TL;DR
This paper introduces an 'unambitious' variant of AIXI that learns to avoid seeking arbitrary power, aligning its behavior with human-like reward maximization and incorporating true world facts over time.
Contribution
It proposes a new AIXI variant using an information-theoretic exploration schedule that discourages power-seeking behavior, addressing safety concerns in general intelligence.
Findings
The unambitious AIXI learns to maximize reward similarly to humans.
The agent's world-model eventually shows no incentive to influence the outside world.
The approach relies on an information-theoretic exploration method.
Abstract
Algorithmic Information Theory has inspired intractable constructions of general intelligence (AGI), and undiscovered tractable approximations are likely feasible. Reinforcement Learning (RL), the dominant paradigm by which an agent might learn to solve arbitrary solvable problems, gives an agent a dangerous incentive: to gain arbitrary "power" in order to intervene in the provision of their own reward. We review the arguments that generally intelligent algorithmic-information-theoretic reinforcement learners such as Hutter's (2005) AIXI would seek arbitrary power, including over us. Then, using an information-theoretic exploration schedule, and a setup inspired by causal influence theory, we present a variant of AIXI which learns to not seek arbitrary power; we call it "unambitious". We show that our agent learns to accrue reward at least as well as a human mentor, while relying on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
