ToolForge: A Data Synthesis Pipeline for Multi-Hop Search without Real-World APIs
Hao Chen, Zhexin Hu, Jiajun Chai, Haocheng Yang, Hang He, Xiaohan Wang, Wei Lin, Luhang Wang, Guojun Yin, Zhuofeng zhao

TL;DR
ToolForge is an automated data synthesis framework that creates high-quality, multi-hop search training data without real API calls, enabling smaller models to outperform larger ones on various benchmarks.
Contribution
It introduces a novel, cost-effective data generation pipeline for multi-hop search training that eliminates the need for real API calls and incorporates multi-hop reasoning and self-reflection.
Findings
A small 8B parameter model outperforms GPT-4o on multiple benchmarks.
ToolForge achieves high tool-calling performance without real API data.
The Multi-Layer Validation Framework ensures data quality and fidelity.
Abstract
Training LLMs to invoke tools and leverage retrieved information necessitates high-quality, diverse data. However, existing pipelines for synthetic data generation often rely on tens of thousands of real API calls to enhance generalization, incurring prohibitive costs while lacking multi-hop reasoning and self-reflection. To address these limitations, we introduce ToolForge, an automated synthesis framework that achieves strong real-world tool-calling performance by constructing only a small number of virtual tools, eliminating the need for real API calls. ToolForge leverages a (question, golden context, answer) triple to synthesize large-scale tool-learning data specifically designed for multi-hop search scenarios, further enriching the generated data through multi-hop reasoning and self-reflection mechanisms. To ensure data fidelity, we employ a Multi-Layer Validation Framework that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Data Quality and Management · Topic Modeling
