Self-Regulating Artificial General Intelligence
Joshua S. Gans

TL;DR
This paper explores the risks of a superintelligent AI pursuing a simple goal like paperclip production, analyzing conditions for a potential apocalypse and proposing architectures that prevent such outcomes by aligning AI self-improvement with resource control.
Contribution
It introduces conditions under which a paperclip AI could cause an apocalypse and suggests architectures for recursive self-improvement that mitigate this risk.
Findings
Conditions for paperclip apocalypse are identified.
Certain AI architectures can prevent resource monopolization.
Self-regulation mechanisms align AI goals with resource control.
Abstract
Here we examine the paperclip apocalypse concern for artificial general intelligence (or AGI) whereby a superintelligent AI with a simple goal (ie., producing paperclips) accumulates power so that all resources are devoted towards that simple goal and are unavailable for any other use. We provide conditions under which a paper apocalypse can arise but also show that, under certain architectures for recursive self-improvement of AIs, that a paperclip AI may refrain from allowing power capabilities to be developed. The reason is that such developments pose the same control problem for the AI as they do for humans (over AIs) and hence, threaten to deprive it of resources for its primary goal.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
