TL;DR
UniToolCall introduces a comprehensive framework for standardizing tool-use representation, data, and evaluation in LLM agents, significantly enhancing their tool interaction capabilities.
Contribution
It unifies the entire tool learning pipeline, curates a large tool pool, models diverse interaction patterns, and converts benchmarks into a unified evaluation format.
Findings
Achieves 93.0% single-turn Strict Precision on Hybrid-20 benchmark.
Substantially improves tool-use performance after fine-tuning Qwen3-8B.
Outperforms commercial models like GPT, Gemini, and Claude.
Abstract
Tool-use capability is a fundamental component of LLM agents, enabling them to interact with external systems through structured function calls. However, existing research exhibits inconsistent interaction representations, largely overlooks the structural distribution of tool-use trajectories, and relies on incompatible evaluation benchmarks. We present UniToolCall, a unified framework for tool learning that standardizes the entire pipeline from toolset construction and dataset generation to evaluation. The framework curates a large tool pool of 22k+ tools and constructs a hybrid training corpus of 390k+ instances by combining 10 standardized public datasets with structurally controlled synthetic trajectories. It explicitly models diverse interaction patterns, including single-hop vs. multi-hop and single-turn vs. multi-turn, while capturing both serial and parallel execution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
