ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities
Zhenchao Jin, Mengchen Liu, Dongdong Chen, Lingting Zhu, Yunsheng Li,, Lequan Yu

TL;DR
This paper introduces ToolBridge, an open-source dataset designed to enhance large language models' ability to effectively utilize external tools, thereby improving their functionality and transparency.
Contribution
It presents a novel dataset construction process and strategies for training LLMs to better invoke external tools, with comprehensive open-source resources.
Findings
LLMs trained on ToolBridge show improved performance on benchmarks.
The dataset enables LLMs to better perform data processing, computation, and factual retrieval.
Experimental results confirm the effectiveness of data-driven training for tool invocation.
Abstract
Through the integration of external tools, large language models (LLMs) such as GPT-4o and Llama 3.1 significantly expand their functional capabilities, evolving from elementary conversational agents to general-purpose assistants. We argue that the primary drivers of these advancements are the quality and diversity of the training data. However, the existing LLMs with external tool integration provide only limited transparency regarding their datasets and data collection methods, which has led to the initiation of this research. Specifically, in this paper, our objective is to elucidate the detailed process involved in constructing datasets that empower LLMs to effectively learn how to utilize external tools and make this information available to the public through the introduction of ToolBridge. ToolBridge proposes to employ a collection of general open-access datasets as its raw…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies
MethodsLLaMA
