Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning

Qianjia Cheng; Yuchen Zhang; Zhilin Wang; Yuxin Zuo; Shunkai Zhang; Yuchen Fan; Yu Qiao; Bowen Zhou; Ning Ding; Yu Cheng; Yun Luo; Ganqu Cui

arXiv:2605.06326·cs.CL·May 8, 2026

Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning

Qianjia Cheng, Yuchen Zhang, Zhilin Wang, Yuxin Zuo, Shunkai Zhang, Yuchen Fan, Yu Qiao, Bowen Zhou, Ning Ding, Yu Cheng, Yun Luo, Ganqu Cui

PDF

TL;DR

This paper presents a comprehensive recipe for integrating tool-use behavior into strong thinking models, enhancing reasoning capabilities without degrading their original text-only reasoning skills.

Contribution

It introduces a full-pipeline approach for tool-integrated reasoning, including supervised fine-tuning and reinforcement learning techniques to improve model performance and stability.

Findings

01

Models achieve state-of-the-art performance on benchmarks like AIME 2025.

02

Controlling tool-use trajectories mitigates catastrophic forgetting.

03

Optimizing for pass@k and response length enhances TIR benefits.

Abstract

Tool-integrated reasoning (TIR) offers a direct way to extend thinking models beyond the limits of text-only reasoning. Paradoxically, we observe that tool-enabled evaluation can degrade reasoning performance even when the strong thinking models make almost no actual tool calls. In this paper, we investigate how to inject natural tool-use behavior into a strong thinking model without sacrificing its no-tool reasoning ability, and present a comprehensive TIR recipe. We highlight that (i) the effectiveness of TIR supervised fine-tuning (SFT) hinges on the learnability of teacher trajectories, which should prioritize problems inherently suited for tool-augmented solutions; (ii) controlling the proportion of tool-use trajectories could mitigate the catastrophic forgetting of text-only reasoning capacity; (iii) optimizing for pass@k and response length instead of training loss could maximize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.