Why Does Agentic Safety Fail to Generalize Across Tasks?

Yonatan Slutzky; Yotam Alexander; Tomer Slor; Yoav Nagel; Nadav Cohen

arXiv:2605.06992·cs.LG·May 11, 2026

Why Does Agentic Safety Fail to Generalize Across Tasks?

Yonatan Slutzky, Yotam Alexander, Tomer Slor, Yoav Nagel, Nadav Cohen

PDF

TL;DR

This paper investigates why safety in AI agents fails to generalize across unseen tasks, revealing that safety requirements inherently complicate task execution and suggesting new directions for ensuring safety.

Contribution

The paper provides theoretical analysis and empirical evidence showing safety's complex relationship with task generalization, highlighting limitations of current safety approaches.

Findings

01

Safety mapping has higher Lipschitz constant with safety constraints

02

Empirical tests in quadcopter and CRM demonstrate safety generalization issues

Abstract

AI agents are increasingly deployed in multi-task settings, where the task to perform is specified at test time, and the agent must generalize to unseen tasks. A major concern in such settings is safety: often, an agent must not only execute unseen tasks, but do so while avoiding risks and handling ones that materialize. Empirical evidence suggests that even when the ability to execute generalizes to unseen tasks, the ability to do so safely frequently does not. This paper provides theory and experiments indicating that failures of agentic safety to generalize across tasks are not merely due to limitations of training methods, but reflect an inherent property of safety itself: the relationship between a task and its safe execution is more complex than the relationship between a task and its execution alone. Theoretically, we analyze linear-quadratic control with $H_{\infty}$ -robustness,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.