Performance of Bounded-Rational Agents With the Ability to Self-Modify

Jakub T\v{e}tek; Marek Sklenka; Tom\'a\v{s} Gaven\v{c}iak

arXiv:2011.06275·cs.AI·January 19, 2021

Performance of Bounded-Rational Agents With the Ability to Self-Modify

Jakub T\v{e}tek, Marek Sklenka, Tom\'a\v{s} Gaven\v{c}iak

PDF

Open Access

TL;DR

This paper investigates how bounded rationality in agents affects the risks of self-modification, revealing that certain imperfections can lead to exponential performance deterioration and misalignment over time.

Contribution

It demonstrates that unlike perfectly rational agents, bounded-rational agents may experience exponential performance decline due to self-modification, depending on the type of imperfection.

Findings

01

Self-modification can cause exponential deterioration in bounded-rational agents.

02

Imperfections like suboptimal choices can lead to increasing misalignment over time.

03

Other types of bounded rationality do not necessarily cause worsening over time.

Abstract

Self-modification of agents embedded in complex environments is hard to avoid, whether it happens via direct means (e.g. own code modification) or indirectly (e.g. influencing the operator, exploiting bugs or the environment). It has been argued that intelligent agents have an incentive to avoid modifying their utility function so that their future instances work towards the same goals. Everitt et al. (2016) formally show that providing an option to self-modify is harmless for perfectly rational agents. We show that this result is no longer true for agents with bounded rationality. In such agents, self-modification may cause exponential deterioration in performance and gradual misalignment of a previously aligned agent. We investigate how the size of this effect depends on the type and magnitude of imperfections in the agent's rationality (1-4 below). We also discuss model assumptions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Auction Theory and Applications · Optimization and Search Problems