ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

Xiaoyu Tian; Haotian Wang; Shuaiting Chen; Hao Zhou; Kaichi Yu; Yudian Zhang; Jade Ouyang; Junxi Yin; Jiong Chen; Baoyan Guo; Lei Zhang; Junjie Tao; Yuansheng Song; Ming Cui; Chengwei Liu

arXiv:2601.21558·cs.CL·February 2, 2026

ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

Xiaoyu Tian, Haotian Wang, Shuaiting Chen, Hao Zhou, Kaichi Yu, Yudian Zhang, Jade Ouyang, Junxi Yin, Jiong Chen, Baoyan Guo, Lei Zhang, Junjie Tao, Yuansheng Song, Ming Cui, Chengwei Liu

PDF

Open Access 2 Models 2 Datasets

TL;DR

ASTRA is an automated framework that synthesizes diverse training trajectories and environments for robust, tool-augmented language model agents, enabling scalable, verifiable reinforcement learning and achieving state-of-the-art performance.

Contribution

The paper introduces ASTRA, a fully automated end-to-end system combining data synthesis and verifiable RL for training tool-augmented language models, reducing manual intervention and improving stability.

Findings

01

Achieves state-of-the-art performance on multiple benchmarks.

02

Approaches the capabilities of closed-source systems.

03

Maintains core reasoning abilities while enhancing tool use.

Abstract

Large language models (LLMs) are increasingly used as tool-augmented agents for multi-step decision making, yet training robust tool-using agents remains challenging. Existing methods still require manual intervention, depend on non-verifiable simulated environments, rely exclusively on either supervised fine-tuning (SFT) or reinforcement learning (RL), and struggle with stable long-horizon, multi-turn learning. To address these challenges, we introduce ASTRA, a fully automated end-to-end framework for training tool-augmented language model agents via scalable data synthesis and verifiable reinforcement learning. ASTRA integrates two complementary components. First, a pipeline that leverages the static topology of tool-call graphs synthesizes diverse, structurally grounded trajectories, instilling broad and transferable tool-use competence. Second, an environment synthesis framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Reinforcement Learning in Robotics