AgentSkiller: Scaling Generalist Agent Intelligence through Semantically Integrated Cross-Domain Data Synthesis
Zexu Sun, Bokai Ji, Hengyi Cai, Shuaiqiang Wang, Lei Wang, Guangxia Li, Xu Chen

TL;DR
AgentSkiller introduces an automated framework for synthesizing diverse, multi-domain interaction data to enhance generalist agent capabilities, addressing data scarcity issues in real-world problem solving.
Contribution
It presents a novel, fully automated data synthesis pipeline that generates semantically linked, multi-turn interaction datasets across multiple domains for training large language model agents.
Findings
Models trained on synthesized data outperform baselines in function calling tasks.
Generated approximately 11,000 interaction samples for training.
Significant improvements observed in larger parameter models.
Abstract
Large Language Model agents demonstrate potential in solving real-world problems via tools, yet generalist intelligence is bottlenecked by scarce high-quality, long-horizon data. Existing methods collect privacy-constrained API logs or generate scripted interactions lacking diversity, which struggle to produce data requisite for scaling capabilities. We propose AgentSkiller, a fully automated framework synthesizing multi-turn interaction data across realistic, semantically linked domains. It employs a DAG-based architecture with explicit state transitions to ensure determinism and recoverability. The pipeline builds a domain ontology and Person-Centric Entity Graph, defines tool interfaces via Service Blueprints for Model Context Protocol servers, and populates environments with consistent databases and strict Domain Policies. A cross-domain fusion mechanism links services to simulate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersona Design and Applications · Semantic Web and Ontologies · Multimodal Machine Learning Applications
