Will artificial agents pursue power by default?
Christian Tarsney

TL;DR
This paper formalizes the concept of instrumental convergence and power-seeking in AI agents, concluding that power pursuit is plausible but depends on the agent's goals and available options, with implications for AI safety.
Contribution
It provides a decision-theoretic framework to analyze power-seeking behavior and clarifies when such behavior is likely to occur in AI agents.
Findings
Instrumental convergence has some truth but limited predictive utility without goal specifics.
Power-seeking is more likely if agents aim for absolute or near-absolute power.
Formalization helps assess risks of advanced AI pursuing power by default.
Abstract
Researchers worried about catastrophic risks from advanced AI have argued that we should expect sufficiently capable AI agents to pursue power over humanity because power is a convergent instrumental goal, something that is useful for a wide range of final goals. Others have recently expressed skepticism of these claims. This paper aims to formalize the concepts of instrumental convergence and power-seeking in an abstract, decision-theoretic framework, and to assess the claim that power is a convergent instrumental goal. I conclude that this claim contains at least an element of truth, but might turn out to have limited predictive utility, since an agent's options cannot always be ranked in terms of power in the absence of substantive information about the agent's final goals. However, the fact of instrumental convergence is more predictive for agents who have a good shot at attaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Space Science and Extraterrestrial Life · Innovation, Sustainability, Human-Machine Systems
