Is Power-Seeking AI an Existential Risk?
Joseph Carlsmith

TL;DR
This paper argues that the development of highly intelligent, potentially misaligned AI systems poses an existential risk by 2070, with a roughly 5-10% chance of catastrophe, due to incentives and technical challenges in alignment.
Contribution
It formulates a detailed six-premise argument for AI existential risk and provides subjective credences, offering a structured assessment of the likelihood of catastrophe by 2070.
Findings
High likelihood of powerful AI development by 2070
Misaligned AI systems may seek power over humans
Estimated >10% chance of existential catastrophe by 2070
Abstract
This report examines what I see as the core argument for concern about existential risk from misaligned artificial intelligence. I proceed in two stages. First, I lay out a backdrop picture that informs such concern. On this picture, intelligent agency is an extremely powerful force, and creating agents much more intelligent than us is playing with fire -- especially given that if their objectives are problematic, such agents would plausibly have instrumental incentives to seek power over humans. Second, I formulate and evaluate a more specific six-premise argument that creating agents of this kind will lead to existential catastrophe by 2070. On this argument, by 2070: (1) it will become possible and financially feasible to build relevantly powerful and agentic AI systems; (2) there will be strong incentives to do so; (3) it will be much harder to build aligned (and relevantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovation, Sustainability, Human-Machine Systems · Ethics and Social Impacts of AI · Insurance and Financial Risk Management
