From Failure to Mastery: Generating Hard Samples for Tool-use Agents

Bingguang Hao; Zengzhuang Xu; Yuntao Wen; Xinyi Xu; Yang Liu; Tong Zhao; Maolin Wang; Long Chen; Dong Wang; Yicheng Chen; Cunyin Peng; Xiangyu Zhao; Chenyi Zhuang; Ji Zhang

arXiv:2601.01498·cs.CL·January 6, 2026

From Failure to Mastery: Generating Hard Samples for Tool-use Agents

Bingguang Hao, Zengzhuang Xu, Yuntao Wen, Xinyi Xu, Yang Liu, Tong Zhao, Maolin Wang, Long Chen, Dong Wang, Yicheng Chen, Cunyin Peng, Xiangyu Zhao, Chenyi Zhuang, Ji Zhang

PDF

Open Access 3 Datasets

TL;DR

This paper introduces HardGen, a novel pipeline for generating challenging, verifiable training samples for tool-use agents, significantly improving their ability to handle complex reasoning tasks.

Contribution

HardGen creates a dynamic, failure-based API graph to synthesize hard training traces, guiding the instantiation of advanced tools for complex reasoning.

Findings

01

A 4B parameter model trained with HardGen data outperforms leading competitors.

02

HardGen generates verifiable, complex Chain-of-Thought reasoning samples.

03

The approach enhances the diversity and difficulty of training data for tool-use agents.

Abstract

The advancement of LLM agents with tool-use capabilities requires diverse and complex training corpora. Existing data generation methods, which predominantly follow a paradigm of random sampling and shallow generation, often yield simple and homogeneous trajectories that fail to capture complex, implicit logical dependencies. To bridge this gap, we introduce HardGen, an automatic agentic pipeline designed to generate hard tool-use training samples with verifiable reasoning. Firstly, HardGen establishes a dynamic API Graph built upon agent failure cases, from which it samples to synthesize hard traces. Secondly, these traces serve as conditional priors to guide the instantiation of modular, abstract advanced tools, which are subsequently leveraged to formulate hard queries. Finally, the advanced tools and hard queries enable the generation of verifiable complex Chain-of-Thought (CoT),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Semantic Web and Ontologies