From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents
Jiaxuan Gao, Jiaao Chen, Chuyi He, Shusheng Xu, Di Jin, Yi Wu

TL;DR
This paper introduces EigenData, a hierarchical multi-agent framework that synthesizes tool-using dialogues and employs verifier-based reinforcement learning to improve multi-turn interactive agents, achieving high performance without extensive human data.
Contribution
The paper presents a novel unified framework combining self-evolving synthetic data generation with verifier-based RL for training complex tool-using agents.
Findings
Achieved 73.0% pass rate on Airline and 98.3% on Telecom benchmarks.
Demonstrated scalable bootstrapping of tool-using behaviors without human annotation.
Improved generation reliability through closed-loop self-evolving prompts and workflows.
Abstract
Interactive tool-using agents must solve real-world tasks via multi-turn interaction with both humans and external environments, requiring dialogue state tracking, multi-step tool execution, while following complex instructions. Post-training such agents is challenging because synthesis for high-quality multi-turn tool-use data is difficult to scale, and reinforcement learning (RL) could face noisy signals caused by user simulation, leading to degraded training efficiency. We propose a unified framework that combines a self-evolving data agent with verifier-based RL. Our system, EigenData, is a hierarchical multi-agent engine that synthesizes tool-grounded dialogues together with executable per-instance checkers, and improves generation reliability via closed-loop self-evolving process that updates prompts and workflow. Building on the synthetic data, we develop an RL recipe that first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems
