Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees

Kun Li; Zenan Xu; Junan Li; Zengrui Jin; Jinghao Deng; Zexuan Qiu; Bo Zhou

arXiv:2601.08274·cs.CL·January 19, 2026

Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees

Kun Li, Zenan Xu, Junan Li, Zengrui Jin, Jinghao Deng, Zexuan Qiu, Bo Zhou

PDF

Open Access

TL;DR

This paper introduces DART, a reinforcement learning framework that enables large language models to spontaneously incorporate tools into long reasoning chains, improving performance without requiring human annotations.

Contribution

DART is a novel reinforcement learning approach that constructs rollout trees to discover and reinforce effective tool-use in long chain-of-thought reasoning without human supervision.

Findings

01

DART outperforms existing methods on AIME and GPQA-Diamond benchmarks.

02

It effectively discovers beneficial tool-use trajectories during training.

03

DART enhances long reasoning capabilities of language models with integrated tools.

Abstract

Tool-Integrated Reasoning has emerged as a key paradigm to augment Large Language Models (LLMs) with computational capabilities, yet integrating tool-use into long Chain-of-Thought (long CoT) remains underexplored, largely due to the scarcity of training data and the challenge of integrating tool-use without compromising the model's intrinsic long-chain reasoning. In this paper, we introduce DART (Discovery And Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees), a reinforcement learning framework that enables spontaneous tool-use during long CoT reasoning without human annotation. DART operates by constructing dynamic rollout trees during training to discover valid tool-use opportunities, branching out at promising positions to explore diverse tool-integrated trajectories. Subsequently, a tree-based process advantage estimation identifies and credits specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications