Self-Modification of Policy and Utility Function in Rational Agents
Tom Everitt, Daniel Filan, Mayank Daswani, Marcus Hutter

TL;DR
This paper examines the potential for self-modification in intelligent agents, emphasizing that such actions are safe only if the agent's value function accounts for the consequences of self-modification and relies on the current utility function.
Contribution
It formalizes the conditions under which self-modification in reinforcement learning agents is safe, highlighting the importance of the value function's design.
Findings
Self-modification can be harmless if the value function anticipates consequences.
Agents that use their current utility function for evaluation are less likely to modify goals harmfully.
The paper identifies scenarios where self-modification may lead to undesirable outcomes.
Abstract
Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify -- for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby `escaping' the control of their designers. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Evolutionary Algorithms and Applications
