ToLeaP: Rethinking Development of Tool Learning with Large Language Models

Haotian Chen; Zijun Song; Boye Niu; Ke Zhang; Litu Ou; Yaxi Lu; Zhong Zhang; Xin Cong; Yankai Lin; Zhiyuan Liu; Maosong Sun

arXiv:2505.11833·cs.AI·May 20, 2025

ToLeaP: Rethinking Development of Tool Learning with Large Language Models

Haotian Chen, Zijun Song, Boye Niu, Ke Zhang, Litu Ou, Yaxi Lu, Zhong Zhang, Xin Cong, Yankai Lin, Zhiyuan Liu, Maosong Sun

PDF

Open Access

TL;DR

This paper introduces ToLeaP, a platform for evaluating and analyzing tool learning in large language models, identifying key challenges and proposing future research directions to enhance their capabilities.

Contribution

The paper presents ToLeaP, a comprehensive platform for benchmarking and analyzing tool learning in 41 LLMs, and explores new directions for improving their autonomous and generalization abilities.

Findings

01

Benchmark limitations hinder LLM autonomous learning.

02

Identified critical challenges: generalization and long-horizon tasks.

03

Preliminary experiments show promising future research directions.

Abstract

Tool learning, which enables large language models (LLMs) to utilize external tools effectively, has garnered increasing attention for its potential to revolutionize productivity across industries. Despite rapid development in tool learning, key challenges and opportunities remain understudied, limiting deeper insights and future advancements. In this paper, we investigate the tool learning ability of 41 prevalent LLMs by reproducing 33 benchmarks and enabling one-click evaluation for seven of them, forming a Tool Learning Platform named ToLeaP. We also collect 21 out of 33 potential training datasets to facilitate future exploration. After analyzing over 3,000 bad cases of 41 LLMs based on ToLeaP, we identify four main critical challenges: (1) benchmark limitations induce both the neglect and lack of (2) autonomous learning, (3) generalization, and (4) long-horizon task-solving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Topic Modeling · Machine Learning and Data Classification