Do LLMs Know Tool Irrelevance? Demystifying Structural Alignment Bias in Tool Invocations
Yilong Liu, Xixun Lin, Pengfei Cao, Ge Zhang, Fang Fang, Yanan Cao

TL;DR
This paper uncovers a structural alignment bias in LLMs that causes them to invoke irrelevant tools, introduces SABEval for analysis, and proposes a mitigation strategy to improve tool invocation accuracy.
Contribution
It identifies a previously overlooked mechanistic flaw in LLM tool refusal, introduces SABEval dataset, and develops a rebalancing method to reduce structural alignment bias.
Findings
Structural alignment bias causes LLMs to invoke irrelevant tools.
SABEval effectively decouples structural alignment from semantic relevance.
The proposed mitigation strategy reduces bias without harming tool-use capabilities.
Abstract
Large language models (LLMs) have demonstrated impressive capabilities in utilizing external tools. In practice, however, LLMs are often exposed to tools that are irrelevant to the user's query, in which case the desired behavior is to refrain from invocations. In this work, we identify a widespread yet overlooked mechanistic flaw in tool refusal, which we term structural alignment bias: Even when a tool fails to serve the user's goal, LLMs still tend to invoke it whenever query attributes can be validly assigned to tool parameters. To systematically study this bias, we introduce SABEval, a new dataset that decouples structural alignment from semantic relevance. Our analysis shows that structural alignment bias induces severe tool-invocation errors in LLMs, yet remains largely unaccounted for in existing evaluations. To investigate the internal mechanisms underlying this bias, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
