Tools Fail: Detecting Silent Errors in Faulty Tools

Jimin Sun; So Yeon Min; Yingshan Chang; Yonatan Bisk

arXiv:2406.19228·cs.CL·June 28, 2024·1 cites

Tools Fail: Detecting Silent Errors in Faulty Tools

Jimin Sun, So Yeon Min, Yingshan Chang, Yonatan Bisk

PDF

Open Access 1 Video

TL;DR

This paper introduces a framework for detecting silent errors in tools used by language models, emphasizing the importance of error detection and recovery in models acting as tools, with promising initial results.

Contribution

It presents a novel framework for detecting silent tool errors in LLMs, shifting focus from tool selection to error detection and recovery.

Findings

01

Effective error detection in calculator setting

02

Promising results in embodied agent planning

03

Framework enhances reliability of models as tools

Abstract

Tools have become a mainstay of LLMs, allowing them to retrieve knowledge not in their weights, to perform tasks on the web, and even to control robots. However, most ontologies and surveys of tool-use have assumed the core challenge for LLMs is choosing the tool. Instead, we introduce a framework for tools more broadly which guides us to explore a model's ability to detect "silent" tool errors, and reflect on how to plan. This more directly aligns with the increasingly popular use of models as tools. We provide an initial approach to failure recovery with promising results both on a controlled calculator setting and embodied agent planning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Tools Fail: Detecting Silent Errors in Faulty Tools· underline

Taxonomy

TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Risk and Safety Analysis