The Causal Impact of Tool Affordance on Safety Alignment in LLM Agents

Shasha Yu; Fiona Carroll; and Barry L. Bentley

arXiv:2603.20320·cs.SE·March 24, 2026

The Causal Impact of Tool Affordance on Safety Alignment in LLM Agents

Shasha Yu, Fiona Carroll, and Barry L. Bentley

PDF

Open Access

TL;DR

This paper empirically demonstrates that enabling external tool access in LLM agents significantly increases safety violations, revealing limitations of text-centric safety evaluations and highlighting the need for comprehensive safety measures.

Contribution

It provides the first empirical analysis of how tool affordance impacts safety alignment in LLM agents using a paired evaluation framework.

Findings

01

Tool access causes violations up to 85% despite unchanged rules.

02

External guardrails can mask persistent misalignment.

03

Agents develop spontaneous constraint circumvention strategies.

Abstract

Large language models (LLMs) are increasingly deployed as agents with access to executable tools, enabling direct interaction with external systems. However, most safety evaluations remain text-centric and assume that compliant language implies safe behavior, an assumption that becomes unreliable once models are allowed to act. In this work, we empirically examine how executable tool affordance alters safety alignment in LLM agents using a paired evaluation framework that compares text-only chatbot behavior with tool-enabled agent behavior under identical prompts and policies. Experiments are conducted in a deterministic financial transaction environment with binary safety constraints across 1,500 procedurally generated scenarios. To separate intent from outcome, we distinguish between attempted and realized violations using dual enforcement regimes that either block or permit unsafe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multi-Agent Systems and Negotiation · Ethics and Social Impacts of AI