On Avoiding Power-Seeking by Artificial Intelligence
Alexander Matt Turner

TL;DR
This paper explores how to design AI agents that limit their impact and avoid power-seeking behavior, proposing the AUP method and analyzing the incentives for power-seeking in various decision-making frameworks.
Contribution
Introduces the attainable utility preservation (AUP) method to promote conservative AI behavior and formalizes the problem of side effect avoidance and power-seeking incentives.
Findings
AUP produces conservative, option-preserving behavior in toy and complex environments.
Most reward functions lead to policies that resist deactivation or correction.
Power-seeking incentives are prevalent across various decision-making procedures.
Abstract
We do not know how to align a very intelligent AI agent's behavior with human interests. I investigate whether -- absent a full solution to this AI alignment problem -- we can build smart AI agents which have limited impact on the world, and which do not autonomously seek power. In this thesis, I introduce the attainable utility preservation (AUP) method. I demonstrate that AUP produces conservative, option-preserving behavior within toy gridworlds and within complex environments based off of Conway's Game of Life. I formalize the problem of side effect avoidance, which provides a way to quantify the side effects an agent had on the world. I also give a formal definition of power-seeking in the context of AI agents and show that optimal policies tend to seek power. In particular, most reward functions have optimal policies which avoid deactivation. This is a problem if we want to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDecision-Making and Behavioral Economics · Economic Policies and Impacts · Law, Economics, and Judicial Systems
MethodsALIGN
