Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol

Christopher Altman

arXiv:2603.11382·cs.AI·March 31, 2026

Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol

Christopher Altman

PDF

1 Repo 1 Datasets

TL;DR

The paper introduces UCIP, a novel detection framework using quantum-inspired models to distinguish whether AI agents have terminal self-preservation objectives or merely instrumental ones, based on latent trajectory structure.

Contribution

UCIP employs a Quantum Boltzmann Machine to measure entanglement entropy, providing a new, behavior-independent criterion for identifying agents' continuation interests.

Findings

01

UCIP achieves 100% detection accuracy on gridworld agents.

02

Type A and Type B agents show a significant entanglement gap with AUC-ROC of 1.0.

03

Classical models fail to reproduce the entanglement effect.

Abstract

How can we determine whether an AI system preserves itself as a deeply held objective or merely as an instrumental strategy? Autonomous agents with memory, persistent context, and multi-step planning create a measurement problem: terminal and instrumental self-preservation can produce similar behavior, so behavior alone cannot reliably distinguish them. We introduce the Unified Continuation-Interest Protocol (UCIP), a detection framework that shifts analysis from behavior to latent trajectory structure. UCIP encodes trajectories with a Quantum Boltzmann Machine, a classical model using density-matrix formalism, and measures von Neumann entropy over a bipartition of hidden units. The core hypothesis is that agents with terminal continuation objectives (Type A) produce higher entanglement entropy than agents with merely instrumental continuation (Type B). UCIP combines this signal with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

christopher-altman/persistence-signal-detector
github

Datasets

Cohaerence/persistence-signal-detector
dataset· 140 dl
140 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.