AI Safety: A Climb To Armageddon?
Herman Cappelen, Josh Dever, John Hawthorne

TL;DR
This paper argues that certain AI safety measures might unintentionally increase existential risks by enabling more powerful AI systems before failure, challenging common safety assumptions and urging re-evaluation of safety strategies.
Contribution
It introduces a novel argument that under specific assumptions, AI safety efforts could have negative utility, prompting a re-examination of safety strategies and core assumptions.
Findings
Safety measures may enable more powerful AI systems before failure.
Existing safety strategies face fundamental challenges like Bottlenecking and the Perfection Barrier.
The paper highlights the need for new research directions in AI safety.
Abstract
This paper presents an argument that certain AI safety measures, rather than mitigating existential risk, may instead exacerbate it. Under certain key assumptions - the inevitability of AI failure, the expected correlation between an AI system's power at the point of failure and the severity of the resulting harm, and the tendency of safety measures to enable AI systems to become more powerful before failing - safety efforts have negative expected utility. The paper examines three response strategies: Optimism, Mitigation, and Holism. Each faces challenges stemming from intrinsic features of the AI safety landscape that we term Bottlenecking, the Perfection Barrier, and Equilibrium Fluctuation. The surprising robustness of the argument forces a re-examination of core assumptions around AI safety and points to several avenues for further research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning
