Internal Representations as Indicators of Hallucinations in Agent Tool Selection
Kait Healy, Bharathi Srinivasan, Visakh Madathil, Jing Wu

TL;DR
This paper introduces a real-time, efficient method to detect hallucinations in LLMs' tool selection by analyzing internal representations, improving reliability in agent systems without extra computational cost.
Contribution
It presents a novel framework that detects tool hallucinations during the same forward pass, enabling real-time error detection without multiple model calls or external validation.
Findings
Achieves up to 86.4% detection accuracy.
Effectively identifies parameter-level hallucinations.
Maintains minimal computational overhead during inference.
Abstract
Large Language Models (LLMs) have shown remarkable capabilities in tool calling and tool usage, but suffer from hallucinations where they choose incorrect tools, provide malformed parameters and exhibit 'tool bypass' behavior by performing simulations and generating outputs instead of invoking specialized tools or external systems. This undermines the reliability of LLM based agents in production systems as it leads to inconsistent results, and bypasses security and audit controls. Such hallucinations in agent tool selection require early detection and error handling. Unlike existing hallucination detection methods that require multiple forward passes or external validation, we present a computationally efficient framework that detects tool-calling hallucinations in real-time by leveraging LLMs' internal representations during the same forward pass used for generation. We evaluate this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling
