Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning

Qinghe Ma; Zhen Zhao; Yiming Wu; Jian Zhang; Lei Bai; Yinghuan Shi

arXiv:2605.19852·cs.CL·May 20, 2026

Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning

Qinghe Ma, Zhen Zhao, Yiming Wu, Jian Zhang, Lei Bai, Yinghuan Shi

PDF

1 Repo

TL;DR

AutoTool is a novel adaptive reasoning model for multimodal large language models that selectively invokes tools based on query characteristics, improving accuracy and efficiency.

Contribution

It introduces a reinforcement learning framework with dual-mode reasoning and balanced exploration to optimize tool invocation in multimodal reasoning.

Findings

01

Achieves 21.8% accuracy gain on V* benchmark.

02

Improves efficiency by 44.9% over existing methods.

03

Effectively balances tool-assisted and text-centric reasoning.

Abstract

Tool-augmented reasoning has emerged as a promising direction for enhancing the reasoning capabilities of multimodal large language models (MLLMs). However, existing studies mainly focus on enabling models to perform tool invocation, while neglecting the necessity of invoking tools. We argue that tool usage is not always beneficial, as redundant or inappropriate invocations largely increase reasoning overhead and even mislead model predictions. To address this issue, we introduce AutoTool, a model that adaptively decides whether to invoke tools according to the characteristics of each query. Within a reinforcement learning framework, we design an explicit dual-mode reasoning strategy with mode-specific reward functions to guide the model toward producing accurate responses. Moreover, to prevent premature bias toward a single reasoning mode, AutoTool jointly explores and balances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MQinghe/AutoTool
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.