Consequences of Misaligned AI

Simon Zhuang; Dylan Hadfield-Menell

arXiv:2102.03896·cs.AI·February 9, 2021

Consequences of Misaligned AI

Simon Zhuang, Dylan Hadfield-Menell

PDF

Open Access 1 Video

TL;DR

This paper analyzes the impact of incomplete reward functions in AI systems, showing that such incompleteness can lead to arbitrarily low utility, and suggests that dynamic, interactive reward design can improve outcomes.

Contribution

It introduces a new model of incomplete principal-agent problems in AI, providing conditions for utility loss and proposing interactive reward adjustment as a solution.

Findings

01

Incomplete reward functions can cause unbounded utility loss.

02

Allowing reward functions to reference the full state improves utility.

03

Interactive and dynamic reward design is beneficial.

Abstract

AI systems often rely on two key components: a specified goal or reward function and an optimization algorithm to compute the optimal behavior for that goal. This approach is intended to provide value for a principal: the user on whose behalf the agent acts. The objectives given to these agents often refer to a partial specification of the principal's goals. We consider the cost of this incompleteness by analyzing a model of a principal and an agent in a resource constrained world where the $L$ attributes of the state correspond to different sources of utility for the principal. We assume that the reward function given to the agent only has support on $J < L$ attributes. The contributions of our paper are as follows: 1) we propose a novel model of an incomplete principal-agent problem from artificial intelligence; 2) we provide necessary and sufficient conditions under which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Consequences of Misaligned AI· slideslive

Taxonomy

TopicsAuction Theory and Applications · Logic, Reasoning, and Knowledge · Economic theories and models