Tools Fail: Detecting Silent Errors in Faulty Tools
Jimin Sun, So Yeon Min, Yingshan Chang, Yonatan Bisk

TL;DR
This paper introduces a framework for detecting silent errors in tools used by language models, emphasizing the importance of error detection and recovery in models acting as tools, with promising initial results.
Contribution
It presents a novel framework for detecting silent tool errors in LLMs, shifting focus from tool selection to error detection and recovery.
Findings
Effective error detection in calculator setting
Promising results in embodied agent planning
Framework enhances reliability of models as tools
Abstract
Tools have become a mainstay of LLMs, allowing them to retrieve knowledge not in their weights, to perform tasks on the web, and even to control robots. However, most ontologies and surveys of tool-use have assumed the core challenge for LLMs is choosing the tool. Instead, we introduce a framework for tools more broadly which guides us to explore a model's ability to detect "silent" tool errors, and reflect on how to plan. This more directly aligns with the increasingly popular use of models as tools. We provide an initial approach to failure recovery with promising results both on a controlled calculator setting and embodied agent planning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Risk and Safety Analysis
