SMARTCAL: An Approach to Self-Aware Tool-Use Evaluation and Calibration
Yuanhao Shen, Xiaodan Zhu, Lei Chen

TL;DR
This paper investigates the self-awareness and calibration issues of large language models in tool use, revealing overconfidence and misuse, and proposes SMARTCAL to improve performance and calibration accuracy.
Contribution
The paper introduces SMARTCAL, a novel approach to enhance LLMs' self-awareness and calibration in tool use, addressing a critical trustworthiness challenge.
Findings
8.6% increase in QA performance
21.6% decrease in calibration error
Tool-abuse behavior is common across models
Abstract
The tool-use ability of Large Language Models (LLMs) has a profound impact on a wide range of industrial applications. However, LLMs' self-control and calibration capability in appropriately using tools remains understudied. The problem is consequential as it raises potential risks of degraded performance and poses a threat to the trustworthiness of the models. In this paper, we conduct a study on a family of state-of-the-art LLMs on three datasets with two mainstream tool-use frameworks. Our study reveals the tool-abuse behavior of LLMs, a tendency for models to misuse tools with overconfidence. We also find that this is a common issue regardless of model capability. Accordingly, we propose a novel approach, \textit{SMARTCAL}, to mitigate the observed issues, and our results show an average of 8.6 percent increase in the QA performance and a 21.6 percent decrease in Expected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Scientific Computing and Data Management · Semantic Web and Ontologies
