Tool-Planner: Task Planning with Clusters across Multiple Tools
Yanming Liu, Xinyue Peng, Jiannan Cao, Yuwei Zhang, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du

TL;DR
Tool-Planner enhances large language models' task planning by grouping tools into clusters based on API functions, enabling more stable, efficient, and adaptable multi-tool usage, demonstrated with improved performance on various datasets.
Contribution
Introduces a toolkit-based framework that clusters tools by function, improving planning stability and efficiency in multi-tool task execution for LLMs.
Findings
High pass and win rates across datasets
Optimized planning schemes for GPT-4 and Claude 3
Effective tool re-selection and adjustment
Abstract
Large language models (LLMs) have demonstrated exceptional reasoning capabilities, enabling them to solve various complex problems. Recently, this ability has been applied to the paradigm of tool learning. Tool learning involves providing examples of tool usage and their corresponding functions, allowing LLMs to formulate plans and demonstrate the process of invoking and executing each tool. LLMs can address tasks that they cannot complete independently, thereby enhancing their potential across different tasks. However, this approach faces two key challenges. First, redundant error correction leads to unstable planning and long execution time. Additionally, designing a correct plan among multiple tools is also a challenge in tool learning. To address these issues, we propose Tool-Planner, a task-processing framework based on toolkits. Tool-Planner groups tools based on the API functions…
Peer Reviews
Decision·ICLR 2025 Poster
1. The paper is well-written and easy to follow. 2. Comprehensive ablation studies on the core component, the toolkit. 3. Detailed appendix with experimental specifics and case studies. 4. Insightful error analysis.
1. The paper could include more comparisons with tree-based inference methods in LLMs, e.g., [1][2] in related work. 2. Consider comparing with ToolChain*[3].
$Originality:$ This paper builds upon DFSDT and introduces a toolkit which provides better performance. $Quality$: The experiments cover domains with many APIs, relevant baselines, and ablations which showcase the importance of the design choices. $Clarity$: The paper is presented very clearly, especially methodology. My only nitpick is Figure 2 - my first impression was that this was another Toolchain [1] paper because the search seems to be over single APIs rather than a selection over tool
- Table 1's feature categories are vague (what is Tool Integration?) and don't make it obvious how your work clearly contrasts from DFSDT. "Tool Clustering" distinguishes this work more from DFSDT. - Figure 2 looks like DFSDT or ToolChain from a glance; it is not obvious that the nodes were chosen from toolkits. Adding more emphasis / text on the toolkits could make things clearer. - Toolchain is compared against in Table 1 but missing from the experiments
- The proposed method is both intuitive and effective, as it enhances the accuracy and efficiency of LLMs in tool invocation through the clustering of tools and planning at the toolkit level. - Experimental results on the ToolBench and APIBench datasets substantiate the efficacy of Tool-Planner in improving both success rates and competitive performance.
- Certain aspects of the methodology require further clarification. For instance, in Section 3.2, the statement "In solving problems for specific states, the model will choose any API within the toolkit for invocation" raises concerns regarding its potential oversimplification. Additionally, since each API necessitates a specific parameter structure for input, how does the model determine the corresponding parameters for an arbitrarily selected API? - The performance of Tool-Planner is significa
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention
