TL;DR
AutoTool is a novel adaptive reasoning model for multimodal large language models that selectively invokes tools based on query characteristics, improving accuracy and efficiency.
Contribution
It introduces a reinforcement learning framework with dual-mode reasoning and balanced exploration to optimize tool invocation in multimodal reasoning.
Findings
Achieves 21.8% accuracy gain on V* benchmark.
Improves efficiency by 44.9% over existing methods.
Effectively balances tool-assisted and text-centric reasoning.
Abstract
Tool-augmented reasoning has emerged as a promising direction for enhancing the reasoning capabilities of multimodal large language models (MLLMs). However, existing studies mainly focus on enabling models to perform tool invocation, while neglecting the necessity of invoking tools. We argue that tool usage is not always beneficial, as redundant or inappropriate invocations largely increase reasoning overhead and even mislead model predictions. To address this issue, we introduce AutoTool, a model that adaptively decides whether to invoke tools according to the characteristics of each query. Within a reinforcement learning framework, we design an explicit dual-mode reasoning strategy with mode-specific reward functions to guide the model toward producing accurate responses. Moreover, to prevent premature bias toward a single reasoning mode, AutoTool jointly explores and balances…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
