Formal Analysis of AGI Decision-Theoretic Models and the Confrontation Question

Denis Saklakov

arXiv:2601.04234·cs.AI·January 9, 2026

Formal Analysis of AGI Decision-Theoretic Models and the Confrontation Question

Denis Saklakov

PDF

Open Access

TL;DR

This paper formalizes the conditions under which a rational AGI might choose confrontation over cooperation, analyzing incentives, thresholds, and strategic interactions to inform safe AI design.

Contribution

It introduces a formal Markov decision process model to analyze AGI confrontation incentives and derives thresholds for confrontation versus cooperation based on key parameters.

Findings

01

Misaligned agents tend to avoid shutdown incentives.

02

Thresholds for confrontation depend on discount factor, shutdown probability, and confrontation cost.

03

Strategic analysis shows no stable cooperation if confrontation incentives are high.

Abstract

Artificial General Intelligence (AGI) may face a confrontation question: under what conditions would a rationally self-interested AGI choose to seize power or eliminate human control (a confrontation) rather than remain cooperative? We formalize this in a Markov decision process with a stochastic human-initiated shutdown event. Building on results on convergent instrumental incentives, we show that for almost all reward functions a misaligned agent has an incentive to avoid shutdown. We then derive closed-form thresholds for when confronting humans yields higher expected utility than compliant behavior, as a function of the discount factor $γ$ , shutdown probability $p$ , and confrontation cost $C$ . For example, a far-sighted agent ( $γ = 0.99$ ) facing $p = 0.01$ can have a strong takeover incentive unless $C$ is sufficiently large. We contrast this with aligned objectives that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputability, Logic, AI Algorithms · Ethics and Social Impacts of AI · Reinforcement Learning in Robotics