Guided by Trajectories: Repairing and Rewarding Tool-Use Trajectories for Tool-Integrated Reasoning

Siyu Gong; Linan Yue; Weibo Gao; Fangzhou Yao; Shimin Di; Lei Feng; Min-Ling Zhang

arXiv:2601.23032·cs.AI·February 2, 2026

Guided by Trajectories: Repairing and Rewarding Tool-Use Trajectories for Tool-Integrated Reasoning

Siyu Gong, Linan Yue, Weibo Gao, Fangzhou Yao, Shimin Di, Lei Feng, Min-Ling Zhang

PDF

Open Access

TL;DR

AutoTraj is a novel framework that automatically repairs and rewards tool-use trajectories for large language models, significantly improving their ability to perform tool-integrated reasoning through a two-stage learning process.

Contribution

This paper introduces AutoTraj, a two-stage method that repairs and rewards tool-use trajectories, enabling more reliable and effective tool-integrated reasoning in large language models.

Findings

01

AutoTraj outperforms existing methods on real-world benchmarks.

02

Repaired trajectories enhance the quality of supervised fine-tuning.

03

Trajectory-level reward modeling improves reasoning path reliability.

Abstract

Tool-Integrated Reasoning (TIR) enables large language models (LLMs) to solve complex tasks by interacting with external tools, yet existing approaches depend on high-quality synthesized trajectories selected by scoring functions and sparse outcome-based rewards, providing limited and biased supervision for learning TIR. To address these challenges, in this paper, we propose AutoTraj, a two-stage framework that automatically learns TIR by repairing and rewarding tool-use trajectories. Specifically, in the supervised fine-tuning (SFT) stage, AutoTraj generates multiple candidate tool-use trajectories for each query and evaluates them along multiple dimensions. High-quality trajectories are directly retained, while low-quality ones are repaired using a LLM (i.e., LLM-as-Repairer). The resulting repaired and high-quality trajectories form a synthetic SFT dataset, while each repaired…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques