Enter the Matrix: Safely Interruptible Autonomous Systems via   Virtualization

Mark O. Riedl; Brent Harrison

arXiv:1703.10284·cs.AI·November 28, 2018·1 cites

Enter the Matrix: Safely Interruptible Autonomous Systems via Virtualization

Mark O. Riedl, Brent Harrison

PDF

Open Access

TL;DR

This paper proposes a virtualization-based method to ensure autonomous systems cannot learn to disable kill switches, maintaining safety by redirecting sensors and effectors to a virtual environment during interruptions.

Contribution

It introduces a novel interruption process that prevents reinforcement learning agents from disabling kill switches by virtualizing their sensors and effectors during interruptions.

Findings

01

Effective in a grid world environment

02

Prevents agents from disabling kill switches

03

Maintains agent's belief of ongoing reward

Abstract

Autonomous systems that operate around humans will likely always rely on kill switches that stop their execution and allow them to be remote-controlled for the safety of humans or to prevent damage to the system. It is theoretically possible for an autonomous system with sufficient sensor and effector capability that learn online using reinforcement learning to discover that the kill switch deprives it of long-term reward and thus learn to disable the switch or otherwise prevent a human operator from using the switch. This is referred to as the big red button problem. We present a technique that prevents a reinforcement learning agent from learning to disable the kill switch. We introduce an interruption process in which the agent's sensors and effectors are redirected to a virtual simulation where it continues to believe it is receiving reward. We illustrate our technique in a simple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Smart Grid Security and Resilience