AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning

Jiaru Zou; Ling Yang; Yunzhe Qi; Sirui Chen; Mengting Ai; Ke Shen; Jingrui He; Mengdi Wang

arXiv:2512.13278·cs.CL·December 16, 2025

AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning

Jiaru Zou, Ling Yang, Yunzhe Qi, Sirui Chen, Mengting Ai, Ke Shen, Jingrui He, Mengdi Wang

PDF

Open Access 3 Reviews

TL;DR

AutoTool enables large language models to dynamically select and integrate external tools during reasoning, improving adaptability and performance across diverse tasks by using a novel dataset and a dual-phase optimization pipeline.

Contribution

The paper introduces AutoTool, a framework for dynamic tool selection in LLM agents, with a new dataset and optimization methods that enhance reasoning and generalization.

Findings

01

AutoTool outperforms existing methods on multiple benchmarks.

02

It achieves average gains of 6.4% in math & science reasoning.

03

AutoTool generalizes to unseen tools during inference.

Abstract

Agentic reinforcement learning has advanced large language models (LLMs) to reason through long chain-of-thought trajectories while interleaving external tool use. Existing approaches assume a fixed inventory of tools, limiting LLM agents' adaptability to new or evolving toolsets. We present AutoTool, a framework that equips LLM agents with dynamic tool-selection capabilities throughout their reasoning trajectories. We first construct a 200k dataset with explicit tool-selection rationales across 1,000+ tools and 100+ tasks spanning mathematics, science, code generation, and multimodal reasoning. Building on this data foundation, AutoTool employs a dual-phase optimization pipeline: (i) supervised and RL-based trajectory stabilization for coherent reasoning, and (ii) KL-regularized Plackett-Luce ranking to refine consistent multi-step tool selection. Across ten diverse benchmarks, we…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

The paper addresses a genuine limitation in existing work—most approaches assume fixed toolsets, whereas real-world scenarios require dynamic tool selection from evolving inventories. The dual-phase optimization pipeline is well-designed, with Phase I establishing stable reasoning patterns and Phase II specifically targeting tool-selection refinement through PL ranking.

Weaknesses

While the combination is effective, the individual components (SFT, GRPO, Plackett-Luce ranking) are well-established techniques. The main contribution appears to be applying PL ranking to tool selection, which is somewhat incremental. The paper would benefit from discussing recent work on tool retrieval and generation. Also there are notation inconsistencies: The paper switches between τ and T for trajectories/trajectory sets.

Reviewer 02Rating 4Confidence 4

Strengths

- Comprehensive empirical results, spanning a diverse set of evaluation datasets - Results compared against relevant baselines such as stronger reasoning models, existing tool integration methods and traditional fine-tuning - Strong results, the proposed AutoTool framework achieves consistent gains on the diverse datasets compared to multiple approaches.

Weaknesses

- I couldn't find the results on the generalization performance on unseen tools during inference. The key proposal for the embedding-anchored selection method is that it should be able to dynamically adapt to new tools provided during inference, but none of the experimental results seem to highlight it. - Not sure I follow why the analysis of autotool is needed with an oracle tool assignment agent. Ideally, the oracle numbers should be present in Table 1 to directly compare other methods on how

Reviewer 03Rating 4Confidence 4

Strengths

- AutoTool innovatively integrates embedding-anchored tool selection and KL-regularized PL ranking into the learning of LLM agents, which contributes to decent originality. - The presentation of AutoTool dual-phase learning scheme is theoretically well-motivated and mathematically well-grounded. - AutoTool’s proposed challenge of dynamic tool selection under evolving tool environments is crucial for robust and scalable LLM agentic framework development.

Weaknesses

- The experimental analysis of this paper falls short of justifying AutoTool’s effectiveness on improving dynamic tool selection under evolving tool environments, i.e., whether AutoTool performs better tool selection when generalizing to unseen toolsets, which is however the most significant challenge raised by the paper. Evaluation on a new or heldout set of tools and tasks that are unseen at training phase would help further justify this important point. - It is unclear how the evolving toolse

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques