Counterfactual Planning in AGI Systems

Koen Holtman

arXiv:2102.00834·cs.AI·February 2, 2021

Counterfactual Planning in AGI Systems

Koen Holtman

PDF

Open Access 1 Datasets

TL;DR

This paper introduces counterfactual planning as a novel design approach for creating safety mechanisms in future AGI systems, using counterfactual world models to guide safe decision-making and control.

Contribution

It proposes a new framework for safety in AGI through counterfactual world models, including mechanisms like emergency stops and input terminals for safe human-AI interaction.

Findings

01

Designed an AGI emergency stop mechanism

02

Developed a safety interlock to prevent intelligence explosion

03

Created a counterfactual oracle system

Abstract

We present counterfactual planning as a design approach for creating a range of safety mechanisms that can be applied in hypothetical future AI systems which have Artificial General Intelligence. The key step in counterfactual planning is to use an AGI machine learning system to construct a counterfactual world model, designed to be different from the real world the system is in. A counterfactual planning agent determines the action that best maximizes expected utility in this counterfactual planning world, and then performs the same action in the real world. We use counterfactual planning to construct an AGI agent emergency stop button, and a safety interlock that will automatically stop the agent before it undergoes an intelligence explosion. We also construct an agent with an input terminal that can be used by humans to iteratively improve the agent's reward function, where the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Kylan12/Synthetic-AI-ML-Dataset
dataset· 42 dl
42 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · AI-based Problem Solving and Planning · Fault Detection and Control Systems