DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

Aili Chen; Chi Zhang; Junteng Liu; Jiangjie Chen; Chengyu Du; Yunji Li; Ming Zhong; Qin Wang; Zhengmao Zhu; Jiayuan Song; Ke Ji; Junxian He; Pengyu Zhao; Yanghua Xiao

arXiv:2603.11076·cs.AI·March 13, 2026

DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

Aili Chen, Chi Zhang, Junteng Liu, Jiangjie Chen, Chengyu Du, Yunji Li, Ming Zhong, Qin Wang, Zhengmao Zhu, Jiayuan Song, Ke Ji, Junxian He, Pengyu Zhao, Yanghua Xiao

PDF

Open Access

TL;DR

This paper introduces DIVE, a method to generate diverse, grounded agentic tasks for training language models, significantly improving out-of-distribution generalization by emphasizing diversity over sheer data quantity.

Contribution

DIVE provides a novel, evidence-driven task synthesis approach that scales diversity along controllable axes, enhancing generalization in tool-using language models.

Findings

01

DIVE improves OOD benchmark scores by +22 points on average.

02

DIVE outperforms baselines by +68 points with fewer data.

03

Diversity scaling outperforms quantity scaling for generalization.

Abstract

Recent work synthesizes agentic tasks for post-training tool-using LLMs, yet robust generalization under shifts in tasks and toolsets remains an open challenge. We trace this brittleness to insufficient diversity in synthesized tasks. Scaling diversity is difficult because training requires tasks to remain executable and verifiable, while generalization demands coverage of diverse tool types, toolset combinations, and heterogeneous tool-use patterns. We propose DIVE, an evidence-driven recipe that inverts synthesis order, executing diverse, real-world tools first and reverse-deriving tasks strictly entailed by the resulting traces, thereby providing grounding by construction. DIVE scales structural diversity along two controllable axes, tool-pool coverage and per-task toolset variety, and an Evidence Collection--Task Derivation loop further induces rich multi-step tool-use patterns…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Machine Learning in Materials Science · Machine Learning and Data Classification