Controllable and Verifiable Tool-Use Data Synthesis for Agentic Reinforcement Learning

Siyuan Xu; Shiyang Li; Xin Liu; Tianyi Liu; Yixiao Li; Zhan Shi; Zixuan Zhang; Zilong Wang; Qingyu Yin; Jianshu Chen; Tuo Zhao; Bing Yin

arXiv:2604.09813·cs.AI·April 14, 2026

Controllable and Verifiable Tool-Use Data Synthesis for Agentic Reinforcement Learning

Siyuan Xu, Shiyang Li, Xin Liu, Tianyi Liu, Yixiao Li, Zhan Shi, Zixuan Zhang, Zilong Wang, Qingyu Yin, Jianshu Chen, Tuo Zhao, Bing Yin

PDF

TL;DR

COVERT is a two-stage data synthesis pipeline that creates reliable, complex tool-use environments supporting reinforcement learning for agentic models, enhancing robustness and accuracy.

Contribution

The paper introduces COVERT, a novel method for generating verifiable, complex tool-use data environments that facilitate RL training with reward-checkable online rollouts.

Findings

01

COVERT-RL improves accuracy on BFCL v3 from 56.5 to 59.9.

02

COVERT-RL improves accuracy on ACEBench from 53.0 to 59.3.

03

Stacking on SFT further increases accuracy to 62.1 and 61.8.

Abstract

Existing synthetic tool-use corpora are primarily designed for offline supervised fine-tuning, yet reinforcement learning (RL) requires executable environments that support reward-checkable online rollouts. We propose COVERT, a two-stage pipeline that first generates reliable base tool-use trajectories through self-evolving synthesis with multi-level validation, and then applies oracle-preserving augmentations that systematically increase environmental complexity. These augmentations introduce distractor tools, indirect or ambiguous user queries, and noisy, multi-format, or erroneous tool outputs, while strictly preserving oracle tool calls and final answers as ground truth. This design enables automatic reward computation via reference matching for standard cases and lightweight judge-assisted verification for special behaviors such as error detection, supporting RL optimization of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.