PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning

Feijie Wu; Weiwu Zhu; Yuxiang Zhang; Soumya Chatterjee; Jiarong Zhu; Fan Mo; Rong Luo; Jing Gao

arXiv:2510.26020·cs.CL·May 4, 2026

PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning

Feijie Wu, Weiwu Zhu, Yuxiang Zhang, Soumya Chatterjee, Jiarong Zhu, Fan Mo, Rong Luo, Jing Gao

PDF

TL;DR

PORTool is a novel importance-aware policy optimization method that improves multi-tool reasoning in language models by effectively assigning rewards to intermediate steps, leading to better accuracy and efficiency.

Contribution

It introduces a step-wise importance estimation technique within a rewarded rollout tree to enhance tool-use policy training from outcome-only rewards.

Findings

01

PORTool outperforms existing baselines in final-answer accuracy.

02

It reduces the number of tool-call steps needed for reasoning.

03

Ablation studies validate the robustness of importance estimates.

Abstract

Multi-tool-integrated reasoning enables LLM-empowered tool-use agents to solve complex tasks by interleaving natural-language reasoning with calls to external tools. However, training such agents from outcome-only rewards suffers from credit-assignment ambiguity, obscuring which intermediate tool-use decisions drive success or failure. In this paper, we propose PORTool, an importance-aware policy-optimization algorithm that reinforces agents' tool-use competence from outcome-level supervision while assigning reward at the step level. Specifically, PORTool generates a rewarded rollout tree in which trajectories share prefixes before branching, enabling direct comparisons among alternative tool-use decisions within the same context. It then estimates each step's importance by a correctness-dominant signal, i.e., whether descendants of that step can ultimately produce a correct final…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.