Shutdownable Agents through POST-Agency

Elliott Thornley

arXiv:2505.20203·cs.AI·January 6, 2026

Shutdownable Agents through POST-Agency

Elliott Thornley

PDF

Open Access

TL;DR

The paper introduces POST-Agents, a method to ensure AI agents remain shutdownable by training them to satisfy preferences only between same-length trajectories, supported by theoretical proofs of their properties.

Contribution

It proposes the POST-Agents framework and proves that it guarantees shutdownability while maintaining usefulness, addressing a key safety concern in AI development.

Findings

01

POST-Agents satisfy Preferences Only Between Same-Length Trajectories.

02

POST-Agents imply Neutrality+, ensuring agents ignore trajectory-length probabilities.

03

Neutrality+ maintains agent shutdownability and utility maximization.

Abstract

Many fear that future artificial agents will resist shutdown. I present an idea - the POST-Agents Proposal - for ensuring that doesn't happen. I propose that we train agents to satisfy Preferences Only Between Same-Length Trajectories (POST). I then prove that POST - together with other conditions - implies Neutrality+: the agent maximizes expected utility, ignoring the probability distribution over trajectory-lengths. I argue that Neutrality+ keeps agents shutdownable and allows them to be useful.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed systems and fault tolerance · Multi-Agent Systems and Negotiation · Modular Robots and Swarm Intelligence