Arguments about Highly Reliable Agent Designs as a Useful Path to   Artificial Intelligence Safety

Issa Rice; David Manheim

arXiv:2201.02950·cs.AI·January 11, 2022

Arguments about Highly Reliable Agent Designs as a Useful Path to Artificial Intelligence Safety

Issa Rice, David Manheim

PDF

Open Access

TL;DR

This paper analyzes the controversial HRAD approach to AI safety by examining four central arguments, clarifying assumptions, and discussing potential strengths and weaknesses to inform future safety strategies.

Contribution

It systematically reviews and clarifies the core arguments supporting HRAD, highlighting their assumptions, claims, and the debates surrounding their effectiveness for AI safety.

Findings

01

Identifies four central arguments: incidental utility, deconfusion, precise specification, prediction.

02

Provides a detailed review of assumptions and claims behind each argument.

03

Discusses counterarguments and limitations of HRAD as an AI safety approach.

Abstract

Several different approaches exist for ensuring the safety of future Transformative Artificial Intelligence (TAI) or Artificial Superintelligence (ASI) systems, and proponents of different approaches have made different and debated claims about the importance or usefulness of their work in the near term, and for future systems. Highly Reliable Agent Designs (HRAD) is one of the most controversial and ambitious approaches, championed by the Machine Intelligence Research Institute, among others, and various arguments have been made about whether and how it reduces risks from future AI systems. In order to reduce confusion in the debate about AI safety, here we build on a previous discussion by Rice which collects and presents four central arguments which are used to justify HRAD as a path towards safety of AI systems. We have titled the arguments (1) incidental utility,(2) deconfusion,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Neuroethics, Human Enhancement, Biomedical Innovations