From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents

Jiaxuan Gao; Jiaao Chen; Chuyi He; Shusheng Xu; Di Jin; Yi Wu

arXiv:2601.22607·cs.AI·March 11, 2026

From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents

Jiaxuan Gao, Jiaao Chen, Chuyi He, Shusheng Xu, Di Jin, Yi Wu

PDF

Open Access 2 Models 1 Datasets

TL;DR

This paper introduces EigenData, a hierarchical multi-agent framework that synthesizes tool-using dialogues and employs verifier-based reinforcement learning to improve multi-turn interactive agents, achieving high performance without extensive human data.

Contribution

The paper presents a novel unified framework combining self-evolving synthetic data generation with verifier-based RL for training complex tool-using agents.

Findings

01

Achieved 73.0% pass rate on Airline and 98.3% on Telecom benchmarks.

02

Demonstrated scalable bootstrapping of tool-using behaviors without human annotation.

03

Improved generation reliability through closed-loop self-evolving prompts and workflows.

Abstract

Interactive tool-using agents must solve real-world tasks via multi-turn interaction with both humans and external environments, requiring dialogue state tracking, multi-step tool execution, while following complex instructions. Post-training such agents is challenging because synthesis for high-quality multi-turn tool-use data is difficult to scale, and reinforcement learning (RL) could face noisy signals caused by user simulation, leading to degraded training efficiency. We propose a unified framework that combines a self-evolving data agent with verifier-based RL. Our system, EigenData, is a hierarchical multi-agent engine that synthesizes tool-grounded dialogues together with executable per-instance checkers, and improves generation reliability via closed-loop self-evolving process that updates prompts and workflow. Building on the synthetic data, we develop an RL recipe that first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

inclusionAI/AReaL-tau2-data
dataset· 297 dl
297 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems