Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use
Yize Cheng, Chenrui Fan, Mahdi JafariRaviz, Keivan Rezaei, Soheil Feizi

TL;DR
This paper investigates the discrepancy between models' perceived necessity of external tools and their actual tool-using behavior in LLMs, revealing a knowing-doing gap and analyzing the underlying causes.
Contribution
It introduces a model-adaptive definition of tool necessity, compares it with observed behavior across models, and diagnoses the cognitive-to-action transition failure.
Findings
Substantial mismatch (26.5-54.0%) in tool necessity across models.
Both internal cognition and execution signals are linearly decodable.
Most mismatch occurs in the transition from recognizing necessity to acting on it.
Abstract
Large language models (LLMs) increasingly act as autonomous agents that must decide when to answer directly vs. when to invoke external tools. Prior work studying adaptive tool use has largely treated tool necessity as a model-agnostic property, annotated by human or LLM judge, and mostly cover cases where the answer is obvious (e.g., fetching the weather vs. paraphrasing text). However, tool necessity in the wild is more nuanced due to the divergence of capability boundaries across models: a problem solvable by a strong model on its own may still require tools for a weaker one. In this work, we introduce a model-adaptive definition of tool-necessity, grounded in each model's empirical performance. Following this definition, we compare the necessity against observed tool-call behavior across four models on arithmetic and factual QA dataset, and find substantial mismatches of 26.5-54.0%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
