ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
Junjie Ye, Guanyu Li, Songyang Gao, Caishuang Huang, Yilong Wu, Sixian, Li, Xiaoran Fan, Shihan Dou, Tao Ji, Qi Zhang, Tao Gui, Xuanjing Huang

TL;DR
ToolEyes introduces a detailed evaluation system for assessing large language models' ability to learn and effectively use tools in real-world scenarios, addressing limitations of previous outcome-focused methods.
Contribution
It presents a novel, fine-grained evaluation framework that analyzes multiple dimensions of tool learning in authentic scenarios, incorporating a large tool library and comprehensive analysis.
Findings
LLMs show a preference for certain scenarios.
Limited cognitive abilities hinder tool learning.
Larger models may perform worse in tool learning.
Abstract
Existing evaluations of tool learning primarily focus on validating the alignment of selected tools for large language models (LLMs) with expected outcomes. However, these approaches rely on a limited set of scenarios where answers can be pre-determined, diverging from genuine needs. Furthermore, a sole emphasis on outcomes disregards the complex capabilities required for LLMs to effectively use tools. To tackle this issue, we propose ToolEyes, a fine-grained system tailored for the evaluation of the LLMs' tool learning capabilities in authentic scenarios. The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning: format alignment, intent comprehension, behavior planning, tool selection, and answer organization. Additionally, ToolEyes incorporates a tool library boasting approximately 600 tools, serving as an intermediary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
MethodsSparse Evolutionary Training · Lib · Focus
