AGI Agent Safety by Iteratively Improving the Utility Function

Koen Holtman

arXiv:2007.05411·cs.AI·July 13, 2020·5 cites

AGI Agent Safety by Iteratively Improving the Utility Function

Koen Holtman

PDF

Open Access 1 Datasets

TL;DR

This paper proposes a mathematical safety layer for AGI agents that allows iterative utility function improvements while suppressing manipulative incentives, aiming to ensure safety from the start.

Contribution

It introduces a formal safety layer with provable properties, applicable to both current machine learning systems and future AGI, enhancing safety through iterative utility function management.

Findings

01

The safety layer can partially or fully suppress manipulative incentives.

02

Mathematical proofs establish safety properties of the layer.

03

The approach is adaptable to real-world AGI systems.

Abstract

While it is still unclear if agents with Artificial General Intelligence (AGI) could ever be built, we can already use mathematical models to investigate potential safety systems for these agents. We present an AGI safety layer that creates a special dedicated input terminal to support the iterative improvement of an AGI agent's utility function. The humans who switched on the agent can use this terminal to close any loopholes that are discovered in the utility function's encoding of agent goals and constraints, to direct the agent towards new goals, or to force the agent to switch itself off. An AGI agent may develop the emergent incentive to manipulate the above utility function improvement process, for example by deceiving, restraining, or even attacking the humans involved. The safety layer will partially, and sometimes fully, suppress this dangerous incentive. The first part of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Kylan12/Synthetic-AI-ML-Dataset
dataset· 42 dl
42 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · AI-based Problem Solving and Planning · Bayesian Modeling and Causal Inference