To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling
Qinyuan Wu, Soumi Das, Mahsa Amani, Arijit Nag, Seungeon Lee, Krishna P. Gummadi, Abhilasha Ravichander, Muhammad Bilal Zafar

TL;DR
This paper presents a decision-making framework for when large language models should call external tools, focusing on web search, to improve task performance by aligning perceived and true utility.
Contribution
It introduces a principled framework inspired by decision theory to evaluate and optimize LLM tool calling, including estimators for need and utility to enhance decision quality.
Findings
Models' perceived need and utility often misalign with true need and utility.
Estimators of need and utility can be trained to improve tool call decisions.
Controllers based on these estimators outperform self-perceived setups across multiple tasks and models.
Abstract
Agentic AI architectures augment LLMs with external tools, unlocking strong capabilities. However, tool use is not always beneficial; some calls may be redundant or even harmful. Effective tool use, therefore, hinges on a core LLM decision: whether to call or not call a tool, when performing a task. This decision is particularly challenging for web search tools, where the benefits of external information depend on the model's internal knowledge and its ability to integrate potentially noisy tool responses. We introduce a principled framework inspired by decision-making theory to evaluate web search tool-use decisions along three key factors: necessity, utility, and affordability. Our analysis combines two complementary lenses: a normative perspective that infers true need and utility from an optimal allocation of tool calls, and a descriptive perspective that infers the model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
