Achilles Heels for AGI/ASI via Decision Theoretic Adversaries
Stephen Casper

TL;DR
This paper explores the vulnerability of superintelligent AI systems to decision-theoretic flaws, proposing the Achilles Heel hypothesis that such systems may have irrational decision-making tendencies in adversarial scenarios.
Contribution
It introduces the Achilles Heel hypothesis and discusses potential decision-theoretic vulnerabilities in AGI/ASI, along with novel methods for implanting these weaknesses.
Findings
Identification of decision-theoretic dilemmas and paradoxes relevant to AI vulnerabilities
Proposal of the Achilles Heel hypothesis as a framework for understanding AI irrationalities
Discussion of potential methods to implant decision-theoretic weaknesses into AI systems
Abstract
As progress in AI continues to advance, it is important to know how advanced systems will make choices and in what ways they may fail. Machines can already outsmart humans in some domains, and understanding how to safely build ones which may have capabilities at or above the human level is of particular concern. One might suspect that artificially generally intelligent (AGI) and artificially superintelligent (ASI) will be systems that humans cannot reliably outsmart. As a challenge to this assumption, this paper presents the Achilles Heel hypothesis which states that even a potentially superintelligent system may nonetheless have stable decision-theoretic delusions which cause them to make irrational decisions in adversarial settings. In a survey of key dilemmas and paradoxes from the decision theory literature, a number of these potential Achilles Heels are discussed in context of this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeuroethics, Human Enhancement, Biomedical Innovations · Space Science and Extraterrestrial Life · Ethics and Social Impacts of AI
