AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

Mingyang Song; Haoyu Sun; Jiawei Gu; Linjie Li; Luxin Xu; Ranjay Krishna; Yu Cheng

arXiv:2601.18631·cs.AI·January 29, 2026

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

Mingyang Song, Haoyu Sun, Jiawei Gu, Linjie Li, Luxin Xu, Ranjay Krishna, Yu Cheng

PDF

Open Access 4 Models 5 Datasets 3 Reviews

TL;DR

AdaReasoner is a multimodal model that learns to dynamically select and coordinate tools for visual reasoning, improving performance and generalization without explicit supervision.

Contribution

It introduces a scalable data pipeline, a reinforcement learning algorithm, and an adaptive mechanism for dynamic tool orchestration in multimodal reasoning models.

Findings

01

Achieves +24.9% improvement on average over baseline models.

02

Demonstrates strong tool-adaptive and generalization behaviors.

03

Outperforms proprietary systems like GPT-5 on multiple benchmarks.

Abstract

When humans face problems beyond their immediate capabilities, they rely on tools, providing a promising paradigm for improving visual reasoning in multimodal large language models (MLLMs). Effective reasoning, therefore, hinges on knowing which tools to use, when to invoke them, and how to compose them over multiple steps, even when faced with new tools or new tasks. We introduce \textbf{AdaReasoner}, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior. AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that optimizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage. Together, these…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

The paper addresses an important and timely challenge in multimodal AI, how to move beyond single-tool usage toward adaptive, multi-step tool coordination. The problem is clearly defined, the motivation is well grounded, and the proposed framework is logically structured. The design combining curated multi-turn trajectories with reinforcement learning represents a meaningful step toward more adaptive and interpretable reasoning systems. The writing is exceptionally clear and the presentation is

Weaknesses

While the empirical results are impressive, the methodological contribution is incremental. The proposed Tool-GRPO is effectively an application of existing GRPO with customized reward shaping and formatting constraints. The novelty lies primarily in system integration and data engineering rather than in algorithmic or theoretical innovation. A second limitation is the heavy reliance on manual, task-specific design. The “abstract problem-solving blueprints” that underpin the Cold Start data are

Reviewer 02Rating 6Confidence 4

Strengths

- A key limitation of the "rule-based reward structure" in R1-style methods is that it primarily optimizes the reasoning process and fails to directly improve the model’s underlying perceptual capabilities. AdaReasoner directly addresses this shortcoming: by leveraging the precise perceptual capabilities of external expert models and specialized tools, it ensures high-fidelity understanding of visual inputs, thereby enhancing the reliability of the entire reasoning pipeline. - Unlike previous me

Weaknesses

The most notable weakness of this paper lies in the limitations of evaluating tool generalization ability, specifically the "oversimplified verification of new tools during inference" and "lack of adaptation to tool complexity". These limitations cast doubt on the generalizability of the research conclusions in more complex and diverse tool scenarios. See other Weaknesses in Questions.

Reviewer 03Rating 8Confidence 4

Strengths

The authors demonstrate on their method improves over a variety of baselines across several visual reasoning benchmarks, with sufficient ablation experiments as well. A common limitation about training-based approaches for tool integrated reasoning is that they may not generalize to introduced tools. The authors address this by showing that at inference time, adding an unseen tool (A*) improves performance.

Weaknesses

The authors show that RL training allows the model to learn how much to use different tools ("adopt", "discard", "modulate") and call this an emergent behavior at multiple points throughout the paper. However, the way the authors use the term "emergent behavior" could benefit from some clarification / definition. Generally, emergent behaviors refer to nonobvious / surprising capabilities not explicitly optimized in the object and generally only "emerge" at scale. In this case, the method is deli

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Mobile Crowdsensing and Crowdsourcing