NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of   Large Language Models

Han Han; Tong Zhu; Xiang Zhang; Mengsong Wu; Hao Xiong; Wenliang Chen

arXiv:2410.11805·cs.CL·January 8, 2025

NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models

Han Han, Tong Zhu, Xiang Zhang, Mengsong Wu, Hao Xiong, Wenliang Chen

PDF

Open Access 1 Repo

TL;DR

NesTools is a new dataset designed to evaluate the ability of large language models to learn and execute nested tool calls, addressing a gap in existing benchmarks and revealing current models' limitations.

Contribution

The paper introduces NesTools, a high-quality, automatically generated dataset for assessing nested tool learning in LLMs, and provides comprehensive experimental analysis.

Findings

01

Current LLMs struggle with complex nested tool tasks.

02

NesTools enables systematic evaluation of nested tool learning.

03

Large-scale experiments reveal limitations of existing models.

Abstract

Large language models (LLMs) combined with tool learning have gained impressive results in real-world applications. During tool learning, LLMs may call multiple tools in nested orders, where the latter tool call may take the former response as its input parameters. However, current research on the nested tool learning capabilities is still under-explored, since the existing benchmarks lack relevant data instances. To address this problem, we introduce NesTools to bridge the current gap in comprehensive nested tool learning evaluations. NesTools comprises a novel automatic data generation method to construct large-scale nested tool calls with different nesting structures. With manual review and refinement, the dataset is in high quality and closely aligned with real-world scenarios. Therefore, NesTools can serve as a new benchmark to evaluate the nested tool learning abilities of LLMs.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hhan1018/nestools
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification