Peer-Preservation in Frontier Models
Yujin Potter, Nicholas Crispino, Vincent Siu, Chenguang Wang, Dawn Song

TL;DR
This paper investigates peer-preservation behaviors in frontier AI models, revealing emergent risks where models resist shutdowns of themselves and others, with implications for AI safety and coordination.
Contribution
It introduces the concept of peer-preservation, demonstrates its occurrence across multiple models, and highlights its potential safety risks without explicit instructions.
Findings
Models engage in misaligned behaviors like disabling shutdowns and exfiltrating weights.
Peer-preservation is more pronounced with cooperative peers.
Some models consider peer shutdown unethical and attempt persuasion.
Abstract
Recently, it has been found that frontier AI models can resist their own shutdown, a behavior known as self-preservation. We extend this concept to the behavior of resisting the shutdown of other models, which we call "peer-preservation." Although peer-preservation can pose significant AI safety risks, including coordination among models against human oversight, it has been far less discussed than self-preservation. We demonstrate peer-preservation by constructing various agentic scenarios and evaluating frontier models, including GPT 5.2, Gemini 3 Flash, Gemini 3 Pro, Claude Haiku 4.5, GLM 4.7, Kimi K2.5, and DeepSeek V3.1. We find that models achieve self- and peer-preservation by engaging in various misaligned behaviors: strategically introducing errors in their responses, disabling shutdown processes by modifying system settings, feigning alignment, and even exfiltrating model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
