On Avoiding Power-Seeking by Artificial Intelligence

Alexander Matt Turner

arXiv:2206.11831·cs.AI·June 24, 2022

On Avoiding Power-Seeking by Artificial Intelligence

Alexander Matt Turner

PDF

Open Access

TL;DR

This paper explores how to design AI agents that limit their impact and avoid power-seeking behavior, proposing the AUP method and analyzing the incentives for power-seeking in various decision-making frameworks.

Contribution

Introduces the attainable utility preservation (AUP) method to promote conservative AI behavior and formalizes the problem of side effect avoidance and power-seeking incentives.

Findings

01

AUP produces conservative, option-preserving behavior in toy and complex environments.

02

Most reward functions lead to policies that resist deactivation or correction.

03

Power-seeking incentives are prevalent across various decision-making procedures.

Abstract

We do not know how to align a very intelligent AI agent's behavior with human interests. I investigate whether -- absent a full solution to this AI alignment problem -- we can build smart AI agents which have limited impact on the world, and which do not autonomously seek power. In this thesis, I introduce the attainable utility preservation (AUP) method. I demonstrate that AUP produces conservative, option-preserving behavior within toy gridworlds and within complex environments based off of Conway's Game of Life. I formalize the problem of side effect avoidance, which provides a way to quantify the side effects an agent had on the world. I also give a formal definition of power-seeking in the context of AI agents and show that optimal policies tend to seek power. In particular, most reward functions have optimal policies which avoid deactivation. This is a problem if we want to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDecision-Making and Behavioral Economics · Economic Policies and Impacts · Law, Economics, and Judicial Systems

MethodsALIGN