Trajectory2Task: Training Robust Tool-Calling Agents with Synthesized Yet Verifiable Data for Complex User Intents

Ziyi Wang; Yuxuan Lu; Yimeng Zhang; Pei Chen; Ziwei Dong; Jing Huang; Jiri Gesi; Xianfeng Tang; Chen Luo; Qun Liu; Yisi Sang; Hanqing Lu; Manling Li; Jin Lai; Dakuo Wang

arXiv:2601.20144·cs.CL·April 23, 2026

Trajectory2Task: Training Robust Tool-Calling Agents with Synthesized Yet Verifiable Data for Complex User Intents

Ziyi Wang, Yuxuan Lu, Yimeng Zhang, Pei Chen, Ziwei Dong, Jing Huang, Jiri Gesi, Xianfeng Tang, Chen Luo, Qun Liu, Yisi Sang, Hanqing Lu, Manling Li, Jin Lai, Dakuo Wang

PDF

TL;DR

This paper introduces Trajectory2Task, a data generation pipeline for creating verifiable, complex user interaction scenarios to improve tool-calling agents' robustness in real-world applications.

Contribution

The paper presents a novel pipeline for generating verifiable, complex user interaction data to train and evaluate tool-calling agents under realistic scenarios.

Findings

01

Benchmarking reveals frequent failures of state-of-the-art LLMs on complex user tasks.

02

Fine-tuning with successful trajectories improves LLM performance across scenarios.

03

The approach enhances generalization to unseen tool-use domains.

Abstract

Tool-calling agents are increasingly deployed in real-world customer-facing workflows. Yet most studies on tool-calling agents focus on idealized settings with general, fixed, and well-specified tasks. In real-world applications, user requests are often (1) ambiguous, (2) changing over time, or (3) infeasible due to policy constraints, and training and evaluation data that cover these diverse, complex interaction patterns remain under-represented. To bridge the gap, we present Trajectory2Task, a verifiable data generation pipeline for studying tool use at scale under three realistic user scenarios: ambiguous intent, changing intent, and infeasible intents. The pipeline first conducts multi-turn exploration to produce valid tool-call trajectories. It then converts these trajectories into user-facing tasks with controlled intent adaptations. This process yields verifiable task that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.