OrchDAG: Complex Tool Orchestration in Multi-Turn Interactions with Plan DAGs
Yifu Lu, Shengjie Liu, Li Dong

TL;DR
OrchDAG introduces a synthetic dataset modeling complex multi-turn tool interactions as DAGs, providing a challenging benchmark and a graph-based reward to improve reinforcement learning in agentic tool use scenarios.
Contribution
The paper presents OrchDAG, a novel synthetic dataset and reward mechanism that enhance modeling and training of complex multi-turn tool interactions using DAGs.
Findings
The dataset is challenging but solvable.
Graph-based reward improves RL training.
Leveraging topological structure enhances performance.
Abstract
Agentic tool use has gained traction with the rise of agentic tool calling, yet most existing work overlooks the complexity of multi-turn tool interactions. We introduce OrchDAG, a synthetic data generation pipeline that models tool execution as directed acyclic graphs (DAGs) with controllable complexity. Using this dataset, we benchmark model performance and propose a graph-based reward to enhance RLVR training. Experiments show that the dataset presents a challenging but solvable benchmark, and the proposed reward is effective when combined with GRPO-style algorithms, highlighting the importance of leveraging topological structure and data complexity in multi-turn tool use.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
