Tool Calling is Linearly Readable and Steerable in Language Models

Zekun Wu (1; 2); Ze Wang (1; 2); Seonglae Cho (2); Yufei Yang (3); Adriano Koshiyama (1; 2); Sahan Bulathwela (1); Maria Perez-Ortiz (1) ((1) University College London; (2) Holistic AI; (3) Imperial College London)

arXiv:2605.07990·cs.CL·May 11, 2026

Tool Calling is Linearly Readable and Steerable in Language Models

Zekun Wu (1, 2), Ze Wang (1, 2), Seonglae Cho (2), Yufei Yang (3), Adriano Koshiyama (1, 2), Sahan Bulathwela (1), Maria Perez-Ortiz (1) ((1) University College London, (2) Holistic AI, (3) Imperial College London)

PDF

TL;DR

This paper demonstrates that language models encode, can be steered to select, and accurately produce tool calls and arguments internally, enabling transparent and controllable tool usage.

Contribution

It reveals that tool selection in language models is linearly decodable and steerable within the model, with implications for transparency and control.

Findings

01

Tool identity is linearly readable and steerable inside models.

02

Flipping tool names switches the chosen tool with high accuracy.

03

Activation patching localizes decision effects to specific attention heads.

Abstract

When a tool-calling agent picks the wrong tool, the failure is invisible until execution: the email gets sent, the meeting gets missed. Probing 12 instruction-tuned models across Gemma 3, Qwen 3, Qwen 2.5, and Llama 3.1 (270M to 27B), we find the identity of the chosen tool is linearly readable and steerable inside the model. Adding the mean-difference between two tools' average internal activations switches which tool the model selects at 77-100% accuracy on name-only single-turn prompts (93-100% at 4B+), and the JSON arguments that follow autoregressively match the new tool's schema, so flipping the name is enough. The same per-tool means also flag likely errors before they happen: on Gemma 3 12B and 27B, queries where the gap between the top-1 and top-2 tool is smallest produce 14-21x more wrong calls than queries with the largest gap. The causal effect concentrates along one…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.