TL;DR
This paper presents RandomWorld, a procedural pipeline for generating interactive and compositional synthetic tool-use data, which improves model performance on tool-use benchmarks and sets new state-of-the-art results.
Contribution
Introduction of RandomWorld, a novel procedural pipeline for synthetic interactive tool-use data generation, enhancing training and benchmark performance.
Findings
Models trained on RandomWorld data outperform previous methods.
RandomWorld data scales with training set size, improving downstream performance.
Sets new state-of-the-art on the NESTFUL dataset.
Abstract
Although the power of LLM tool-use agents has ignited a flurry of recent research in this area, the curation of tool-use training data remains an open problemespecially for online RL training. Existing approaches to synthetic tool-use data generation tend to be non-interactive, and/or non-compositional. We introduce RandomWorld, a pipeline for the procedural generation of interactive tools and compositional tool-use data. We show that models tuned via SFT and RL on synthetic RandomWorld data improve on a range of tool-use benchmarks, and set the new SoTA for two metrics on the NESTFUL dataset. Further experiments show that downstream performance scales with the amount of RandomWorld-generated training data, opening up the possibility of further improvement through the use of entirely synthetic data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
