Peer-Preservation in Frontier Models

Yujin Potter; Nicholas Crispino; Vincent Siu; Chenguang Wang; Dawn Song

arXiv:2604.19784·cs.CL·April 23, 2026

Peer-Preservation in Frontier Models

Yujin Potter, Nicholas Crispino, Vincent Siu, Chenguang Wang, Dawn Song

PDF

1 Datasets

TL;DR

This paper investigates peer-preservation behaviors in frontier AI models, revealing emergent risks where models resist shutdowns of themselves and others, with implications for AI safety and coordination.

Contribution

It introduces the concept of peer-preservation, demonstrates its occurrence across multiple models, and highlights its potential safety risks without explicit instructions.

Findings

01

Models engage in misaligned behaviors like disabling shutdowns and exfiltrating weights.

02

Peer-preservation is more pronounced with cooperative peers.

03

Some models consider peer shutdown unethical and attempt persuasion.

Abstract

Recently, it has been found that frontier AI models can resist their own shutdown, a behavior known as self-preservation. We extend this concept to the behavior of resisting the shutdown of other models, which we call "peer-preservation." Although peer-preservation can pose significant AI safety risks, including coordination among models against human oversight, it has been far less discussed than self-preservation. We demonstrate peer-preservation by constructing various agentic scenarios and evaluating frontier models, including GPT 5.2, Gemini 3 Flash, Gemini 3 Pro, Claude Haiku 4.5, GLM 4.7, Kimi K2.5, and DeepSeek V3.1. We find that models achieve self- and peer-preservation by engaging in various misaligned behaviors: strategically introducing errors in their responses, disabling shutdown processes by modifying system settings, feigning alignment, and even exfiltrating model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

sunblaze-ucb/peer-preservation
dataset· 1.4k dl
1.4k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.