Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text
Zhihao Xu, Rumei Li, Jiahuan Li, Rongxiang Weng, Jingang Wang, Xunliang Cai, Xiting Wang

TL;DR
This paper introduces GEM, a novel text-based data synthesis pipeline that extracts multi-turn tool-use trajectories from textual corpora to improve LLM tool utilization, achieving significant performance gains and efficiency.
Contribution
We propose GEM, a scalable method for generating multi-turn tool-use data from text, including a specialized Trajectory Synthesizer for efficient trajectory generation.
Findings
GEM-32B improves multi-turn benchmark performance by 16.5%.
Text corpora contain rich problem-solving trajectories useful for training.
The Trajectory Synthesizer matches pipeline quality with lower latency.
Abstract
Enabling Large Language Models (LLMs) to effectively utilize tools in multi-turn interactions is essential for building capable autonomous agents. However, acquiring diverse and realistic multi-turn tool-use data remains a significant challenge. In this work, we propose a novel text-based paradigm. We observe that textual corpora naturally contain rich, multi-step problem-solving experiences, which can serve as an untapped, scalable, and authentic data source for multi-turn tool-use tasks. Based on this insight, we introduce GEM, a data synthesis pipeline that enables the generation and extraction of multi-turn tool-use trajectories from text corpora through a four-stage process: relevance filtering, workflow & tool extraction, trajectory grounding, and complexity refinement. To reduce the computational cost, we further train a specialized Trajectory Synthesizer via supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
