Arguments about Highly Reliable Agent Designs as a Useful Path to Artificial Intelligence Safety
Issa Rice, David Manheim

TL;DR
This paper analyzes the controversial HRAD approach to AI safety by examining four central arguments, clarifying assumptions, and discussing potential strengths and weaknesses to inform future safety strategies.
Contribution
It systematically reviews and clarifies the core arguments supporting HRAD, highlighting their assumptions, claims, and the debates surrounding their effectiveness for AI safety.
Findings
Identifies four central arguments: incidental utility, deconfusion, precise specification, prediction.
Provides a detailed review of assumptions and claims behind each argument.
Discusses counterarguments and limitations of HRAD as an AI safety approach.
Abstract
Several different approaches exist for ensuring the safety of future Transformative Artificial Intelligence (TAI) or Artificial Superintelligence (ASI) systems, and proponents of different approaches have made different and debated claims about the importance or usefulness of their work in the near term, and for future systems. Highly Reliable Agent Designs (HRAD) is one of the most controversial and ambitious approaches, championed by the Machine Intelligence Research Institute, among others, and various arguments have been made about whether and how it reduces risks from future AI systems. In order to reduce confusion in the debate about AI safety, here we build on a previous discussion by Rice which collects and presents four central arguments which are used to justify HRAD as a path towards safety of AI systems. We have titled the arguments (1) incidental utility,(2) deconfusion,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Neuroethics, Human Enhancement, Biomedical Innovations
