ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use

Mengjie Deng; Guanting Dong; Zhicheng Dou

arXiv:2510.27363·cs.AI·November 3, 2025

ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use

Mengjie Deng, Guanting Dong, Zhicheng Dou

PDF

Open Access

TL;DR

ToolScope is a novel framework that enhances multimodal large language models' ability to perform long-horizon visual question answering by integrating global planning with local perception tools, leading to improved accuracy.

Contribution

It introduces ToolScope, a unified agentic framework combining global planning and local perception tools for better multimodal reasoning in long-horizon tasks.

Findings

01

Achieves up to +6.69% performance improvement across benchmarks.

02

Demonstrates strong generalization across diverse domains.

03

Effectively mitigates visual context degradation in VQA tasks.

Abstract

Recently, large language models (LLMs) have demonstrated remarkable problem-solving capabilities by autonomously integrating with external tools for collaborative reasoning. However, due to the inherently complex and diverse nature of multimodal information, enabling multimodal large language models (MLLMs) to flexibly and efficiently utilize external tools during reasoning remains an underexplored challenge. In this work, we introduce ToolScope, an agentic framework designed to unify global planning with local multimodal perception, adopting a specialized Perceive tool to mitigates visual context degradation in long-horizon VQA task. ToolScope comprises three primary components: the Global Navigator, the Agentic Executor, and the Response Synthesizer. The Global Navigator functions as a "telescope", offering high-level strategic guidance. The Agentic Executor operates iteratively to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks