Minimum Levels of Interpretability for Artificial Moral Agents

Avish Vijayaraghavan; Cosmin Badea

arXiv:2307.00660·cs.AI·July 4, 2023

Minimum Levels of Interpretability for Artificial Moral Agents

Avish Vijayaraghavan, Cosmin Badea

PDF

Open Access

TL;DR

This paper discusses the importance of interpretability in artificial moral agents, introduces the concept of Minimum Level of Interpretability (MLI), and recommends standards for safe deployment of these AI systems.

Contribution

It introduces the concept of MLI for artificial moral agents and provides guidelines for their interpretability levels to ensure safety and trust.

Findings

01

Proposes the concept of Minimum Level of Interpretability (MLI)

02

Recommends interpretability standards for different types of AMA

03

Aids in safe deployment of AI in moral decision-making

Abstract

As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)