AgentSHAP: Interpreting LLM Agent Tool Importance with Monte Carlo Shapley Value Estimation
Miriam Horovicz

TL;DR
AgentSHAP introduces a model-agnostic framework using Monte Carlo Shapley values to explain the importance of external tools in LLM agents, addressing a key gap in AI interpretability.
Contribution
It is the first explainability method for tool attribution in LLM agents, utilizing game theory and efficient sampling to provide consistent importance scores.
Findings
AgentSHAP accurately identifies relevant tools in LLM agents.
The method produces consistent importance scores across multiple runs.
It effectively distinguishes between relevant and irrelevant tools.
Abstract
LLM agents that use external tools can solve complex tasks, but understanding which tools actually contributed to a response remains a blind spot. No existing XAI methods address tool-level explanations. We introduce AgentSHAP, the first framework for explaining tool importance in LLM agents. AgentSHAP is model-agnostic: it treats the agent as a black box and works with any LLM (GPT, Claude, Llama, etc.) without needing access to internal weights or gradients. Using Monte Carlo Shapley values, AgentSHAP tests how an agent responds with different tool subsets and computes fair importance scores based on game theory. Our contributions are: (1) the first explainability method for agent tool attribution, grounded in Shapley values from game theory; (2) Monte Carlo sampling that reduces cost from O(2n) to practical levels; and (3) comprehensive experiments on API-Bank showing that AgentSHAP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Generative Adversarial Networks and Image Synthesis
