Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text

Zhihao Xu; Rumei Li; Jiahuan Li; Rongxiang Weng; Jingang Wang; Xunliang Cai; Xiting Wang

arXiv:2601.10355·cs.CL·January 16, 2026

Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text

Zhihao Xu, Rumei Li, Jiahuan Li, Rongxiang Weng, Jingang Wang, Xunliang Cai, Xiting Wang

PDF

Open Access

TL;DR

This paper introduces GEM, a novel text-based data synthesis pipeline that extracts multi-turn tool-use trajectories from textual corpora to improve LLM tool utilization, achieving significant performance gains and efficiency.

Contribution

We propose GEM, a scalable method for generating multi-turn tool-use data from text, including a specialized Trajectory Synthesizer for efficient trajectory generation.

Findings

01

GEM-32B improves multi-turn benchmark performance by 16.5%.

02

Text corpora contain rich problem-solving trajectories useful for training.

03

The Trajectory Synthesizer matches pipeline quality with lower latency.

Abstract

Enabling Large Language Models (LLMs) to effectively utilize tools in multi-turn interactions is essential for building capable autonomous agents. However, acquiring diverse and realistic multi-turn tool-use data remains a significant challenge. In this work, we propose a novel text-based paradigm. We observe that textual corpora naturally contain rich, multi-step problem-solving experiences, which can serve as an untapped, scalable, and authentic data source for multi-turn tool-use tasks. Based on this insight, we introduce GEM, a data synthesis pipeline that enables the generation and extraction of multi-turn tool-use trajectories from text corpora through a four-stage process: relevance filtering, workflow & tool extraction, trajectory grounding, and complexity refinement. To reduce the computational cost, we further train a specialized Trajectory Synthesizer via supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques