ToolGen: Unified Tool Retrieval and Calling via Generation

Renxi Wang; Xudong Han; Lei Ji; Shu Wang; Timothy Baldwin; Haonan Li

arXiv:2410.03439·cs.CL·April 1, 2025

ToolGen: Unified Tool Retrieval and Calling via Generation

Renxi Wang, Xudong Han, Lei Ji, Shu Wang, Timothy Baldwin, Haonan Li

PDF

Open Access 1 Repo 1 Datasets 3 Reviews

TL;DR

ToolGen introduces a novel approach where tools are embedded as tokens within large language models, enabling seamless, scalable, and autonomous tool invocation directly through language generation, significantly improving AI task performance.

Contribution

It presents ToolGen, a method that embeds tools into LLMs as tokens, eliminating retrieval steps and enhancing tool utilization and task execution capabilities.

Findings

01

Achieves superior tool retrieval and task completion results.

02

Enables access to over 47,000 tools without additional retrieval.

03

Facilitates end-to-end tool learning and integration with advanced techniques.

Abstract

As large language models (LLMs) advance, their inability to autonomously execute tasks by directly interacting with external tools remains a critical limitation. Traditional methods rely on inputting tool descriptions as context, which is constrained by context length and requires separate, often inefficient, retrieval mechanisms. We introduce ToolGen, a paradigm shift that integrates tool knowledge directly into the LLM's parameters by representing each tool as a unique token. This enables the LLM to generate tool calls and arguments as part of its next token prediction capabilities, seamlessly blending tool invocation with language generation. Our framework allows the LLM to access and utilize a vast amount of tools with no additional retrieval step, significantly enhancing both performance and scalability. Experimental results with over 47,000 tools show that ToolGen not only…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 5Confidence 5

Strengths

- [S1] ToolGen outperform or achieves competitive performance among retrieval and end-to-end baselines on ToolBench. - [S2] ToolGen can natively invoke 47K tools following the context.

Weaknesses

- [W1] The technical novelty is limited. Using special tokens for tools and incorporating them into the original vocabularies are widely-known approach (e.g. Toolformer: https://arxiv.org/abs/2302.04761, ToolkenGPT: https://arxiv.org/abs/2305.11554). The contribution of this paper is scaling this up to 47K tools, but it's very straight forward and I'm not confident if the ICLR community would be interested in it. - [W2] Releted to [W1], the results of ToolLlama-3 in Section 5 is unclear to me. W

Reviewer 02Rating 5Confidence 3

Strengths

1. ToolGen elegantly combines tool retrieval and execution into a single generative process, eliminating the need for separate retrieval mechanisms. This streamlines tool interaction and enhances efficiency, particularly as the number of tools increases. 2. The use of constrained beam search during inference effectively restricts the output to valid tool tokens, significantly reducing the generation of nonexistent tools, a common issue in LLM-based agents. 3. ToolGen demonstrates its capacity to

Weaknesses

1. The advantage of ToolGen which combines tool retrieval and execution into a single generative process introduces limitation together with its efficiency. Since the tools are integrated into the system as tokens, the extension of new tools become inefficient. For every new tool/API, new token need to be added and the documentation finetuned into the model. Also, consider the case that when the tool/APIs get updated, the maintenance of all the tool/APIs, making sure they are up to date is a qui

Reviewer 03Rating 5Confidence 3

Strengths

**Originality:** The paper introduces a novel approach to tool retrieval by representing each tool as a unique virtual token directly integrated into the LLM’s vocabulary. This method eliminates the need for auxiliary retrievers, making the retrieval process more seamless and efficient. The concept of transforming tool retrieval into a generative task is novel and presents a possible solution to the scalability challenges faced by some existing methods. **Quality:** The authors have conducted a

Weaknesses

**Substantive Assessment of Weaknesses:** **Cost and Efficiency Claims:** The key claim of "significantly less cost and higher efficiency" does not hold up under scrutiny. The authors have not substantively demonstrated that their framework is less costly or more efficient than existing methodologies. The ToolGen framework necessitates a three-stage training process, which does not inherently suggest reduced costs. Furthermore, the paper lack data or experiments to substantiate the claim regard

Code & Models

Repositories

Reason-Wang/ToolGen
pytorchOfficial

Datasets

reasonwang/ToolGen-Datasets
dataset· 158 dl
158 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries · Service-Oriented Architecture and Web Services