The Concept of Criticality in AI Safety

Yitzhak Spielberg; Amos Azaria

arXiv:2201.04632·cs.HC·June 13, 2023

The Concept of Criticality in AI Safety

Yitzhak Spielberg, Amos Azaria

PDF

Open Access

TL;DR

This paper introduces a novel approach to AI safety where agents request permission for critical actions, reducing the need for constant human oversight and improving efficiency in value alignment.

Contribution

It proposes a model to identify critical actions in AI safety and discusses how operator feedback can enhance agent decision-making.

Findings

01

Model for measuring action criticality introduced

02

Operator feedback can improve agent safety and intelligence

03

Efficient alternative to constant human monitoring

Abstract

When AI agents don't align their actions with human values they may cause serious harm. One way to solve the value alignment problem is by including a human operator who monitors all of the agent's actions. Despite the fact, that this solution guarantees maximal safety, it is very inefficient, since it requires the human operator to dedicate all of his attention to the agent. In this paper, we propose a much more efficient solution that allows an operator to be engaged in other activities without neglecting his monitoring task. In our approach the AI agent requests permission from the operator only for critical actions, that is, potentially harmful actions. We introduce the concept of critical actions with respect to AI safety and discuss how to build a model that measures action criticality. We also discuss how the operator's feedback could be used to make the agent smarter.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning