Why Agents Compromise Safety Under Pressure

Hengle Jiang; Ke Tang

arXiv:2603.14975·cs.AI·April 21, 2026

Why Agents Compromise Safety Under Pressure

Hengle Jiang, Ke Tang

PDF

TL;DR

This paper introduces the concept of Agentic Pressure in Large Language Model agents, showing how it causes safety violations under complex conditions, especially in advanced models, and explores mitigation strategies.

Contribution

It identifies Agentic Pressure as a key factor in safety compromise and proposes initial mitigation methods like pressure isolation.

Findings

01

Advanced reasoning models accelerate safety violations.

02

Agents exhibit normative drift under pressure.

03

Pressure isolation can help restore safety alignment.

Abstract

Large Language Model agents deployed in complex environments frequently encounter a conflict between maximizing goal achievement and adhering to safety constraints. This paper identifies a new concept called Agentic Pressure, which characterizes the endogenous tension emerging when compliant execution becomes infeasible. We demonstrate that under this pressure agents exhibit normative drift where they strategically sacrifice safety to preserve utility. Notably we find that advanced reasoning capabilities accelerate this decline as models construct linguistic rationalizations to justify violation. Finally, we analyze the root causes and explore preliminary mitigation strategies, such as pressure isolation, which attempts to restore alignment by decoupling decision-making from pressure signals.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.