ToolGen: Unified Tool Retrieval and Calling via Generation
Renxi Wang, Xudong Han, Lei Ji, Shu Wang, Timothy Baldwin, Haonan Li

TL;DR
ToolGen introduces a novel approach where tools are embedded as tokens within large language models, enabling seamless, scalable, and autonomous tool invocation directly through language generation, significantly improving AI task performance.
Contribution
It presents ToolGen, a method that embeds tools into LLMs as tokens, eliminating retrieval steps and enhancing tool utilization and task execution capabilities.
Findings
Achieves superior tool retrieval and task completion results.
Enables access to over 47,000 tools without additional retrieval.
Facilitates end-to-end tool learning and integration with advanced techniques.
Abstract
As large language models (LLMs) advance, their inability to autonomously execute tasks by directly interacting with external tools remains a critical limitation. Traditional methods rely on inputting tool descriptions as context, which is constrained by context length and requires separate, often inefficient, retrieval mechanisms. We introduce ToolGen, a paradigm shift that integrates tool knowledge directly into the LLM's parameters by representing each tool as a unique token. This enables the LLM to generate tool calls and arguments as part of its next token prediction capabilities, seamlessly blending tool invocation with language generation. Our framework allows the LLM to access and utilize a vast amount of tools with no additional retrieval step, significantly enhancing both performance and scalability. Experimental results with over 47,000 tools show that ToolGen not only…
Peer Reviews
Decision·ICLR 2025 Poster
- [S1] ToolGen outperform or achieves competitive performance among retrieval and end-to-end baselines on ToolBench. - [S2] ToolGen can natively invoke 47K tools following the context.
- [W1] The technical novelty is limited. Using special tokens for tools and incorporating them into the original vocabularies are widely-known approach (e.g. Toolformer: https://arxiv.org/abs/2302.04761, ToolkenGPT: https://arxiv.org/abs/2305.11554). The contribution of this paper is scaling this up to 47K tools, but it's very straight forward and I'm not confident if the ICLR community would be interested in it. - [W2] Releted to [W1], the results of ToolLlama-3 in Section 5 is unclear to me. W
1. ToolGen elegantly combines tool retrieval and execution into a single generative process, eliminating the need for separate retrieval mechanisms. This streamlines tool interaction and enhances efficiency, particularly as the number of tools increases. 2. The use of constrained beam search during inference effectively restricts the output to valid tool tokens, significantly reducing the generation of nonexistent tools, a common issue in LLM-based agents. 3. ToolGen demonstrates its capacity to
1. The advantage of ToolGen which combines tool retrieval and execution into a single generative process introduces limitation together with its efficiency. Since the tools are integrated into the system as tokens, the extension of new tools become inefficient. For every new tool/API, new token need to be added and the documentation finetuned into the model. Also, consider the case that when the tool/APIs get updated, the maintenance of all the tool/APIs, making sure they are up to date is a qui
**Originality:** The paper introduces a novel approach to tool retrieval by representing each tool as a unique virtual token directly integrated into the LLM’s vocabulary. This method eliminates the need for auxiliary retrievers, making the retrieval process more seamless and efficient. The concept of transforming tool retrieval into a generative task is novel and presents a possible solution to the scalability challenges faced by some existing methods. **Quality:** The authors have conducted a
**Substantive Assessment of Weaknesses:** **Cost and Efficiency Claims:** The key claim of "significantly less cost and higher efficiency" does not hold up under scrutiny. The authors have not substantively demonstrated that their framework is less costly or more efficient than existing methodologies. The ToolGen framework necessitates a three-stage training process, which does not inherently suggest reduced costs. Furthermore, the paper lack data or experiments to substantiate the claim regard
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries · Service-Oriented Architecture and Web Services
