Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows

Anuj Sadani; Deepak Kumar

arXiv:2604.21816·cs.AI·April 24, 2026

Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows

Anuj Sadani, Deepak Kumar

PDF

1 Repo

TL;DR

This paper introduces Tool Attention, a middleware mechanism that significantly reduces token overhead in LLM agent workflows by gating tool access and lazy schema loading, improving efficiency and context utilization.

Contribution

It proposes a novel Tool Attention method that generalizes self-attention to tools, combining intent scoring, gating, and lazy schema loading to optimize LLM external tool integration.

Findings

01

Reduces per-turn tool tokens by 95% in simulations

02

Increases effective context utilization from 24% to 91%

03

Supports protocol-level efficiency as a key constraint in scalable systems

Abstract

The Model Context Protocol (MCP) has become a common interface for connecting large language model (LLM) agents to external tools, but its reliance on stateless, eager schema injection imposes a hidden per-turn overhead the MCP Tax or Tools Tax that practitioner reports place between roughly 10k and 60k tokens in typical multi-server deployments. This payload inflates the key-value cache, is associated with reasoning degradation as context utilization approaches published fracture points around 70%, and turns token budgets into a recurring operational cost. We introduce Tool Attention, a middleware-layer mechanism that generalizes the "Attention Is All You Need" paradigm from self-attention over tokens to gated attention over tools. Tool Attention combines (i) an Intent Schema Overlap (ISO) score from sentence embeddings, (ii) a state-aware gating function enforcing preconditions and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

asadani/tool-attention
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.