PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning
Feijie Wu, Weiwu Zhu, Yuxiang Zhang, Soumya Chatterjee, Jiarong Zhu, Fan Mo, Rong Luo, Jing Gao

TL;DR
PORTool is a novel importance-aware policy optimization method that improves multi-tool reasoning in language models by effectively assigning rewards to intermediate steps, leading to better accuracy and efficiency.
Contribution
It introduces a step-wise importance estimation technique within a rewarded rollout tree to enhance tool-use policy training from outcome-only rewards.
Findings
PORTool outperforms existing baselines in final-answer accuracy.
It reduces the number of tool-call steps needed for reasoning.
Ablation studies validate the robustness of importance estimates.
Abstract
Multi-tool-integrated reasoning enables LLM-empowered tool-use agents to solve complex tasks by interleaving natural-language reasoning with calls to external tools. However, training such agents from outcome-only rewards suffers from credit-assignment ambiguity, obscuring which intermediate tool-use decisions drive success or failure. In this paper, we propose PORTool, an importance-aware policy-optimization algorithm that reinforces agents' tool-use competence from outcome-level supervision while assigning reward at the step level. Specifically, PORTool generates a rewarded rollout tree in which trajectories share prefixes before branching, enabling direct comparisons among alternative tool-use decisions within the same context. It then estimates each step's importance by a correctness-dominant signal, i.e., whether descendants of that step can ultimately produce a correct final…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
