SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Haoyu Huang; Jinfa Huang; Zhongwei Wan; Xiawu Zheng; Rongrong Ji; Jiebo Luo

arXiv:2603.23483·cs.CV·March 25, 2026

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Haoyu Huang, Jinfa Huang, Zhongwei Wan, Xiawu Zheng, Rongrong Ji, Jiebo Luo

PDF

Open Access

TL;DR

SpecEyes is a framework that accelerates agentic multimodal LLMs by using speculative planning with a lightweight model to predict execution paths, reducing latency and increasing throughput without sacrificing accuracy.

Contribution

It introduces a novel speculative acceleration method with a cognitive gating mechanism and parallel funnel to improve efficiency of agentic multimodal LLMs.

Findings

01

Achieves 1.1-3.35x speedup over baseline.

02

Maintains or improves accuracy up to +6.7%.

03

Enhances system throughput under concurrent workloads.

Abstract

Agentic multimodal large language models (MLLMs) (e.g., OpenAI o3 and Gemini Agentic Vision) achieve remarkable reasoning capabilities through iterative visual tool invocation. However, the cascaded perception, reasoning, and tool-calling loops introduce significant sequential overhead. This overhead, termed agentic depth, incurs prohibitive latency and seriously limits system-level concurrency. To this end, we propose SpecEyes, an agentic-level speculative acceleration framework that breaks this sequential bottleneck. Our key insight is that a lightweight, tool-free MLLM can serve as a speculative planner to predict the execution trajectory, enabling early termination of expensive tool chains without sacrificing accuracy. To regulate this speculative planning, we introduce a cognitive gating mechanism based on answer separability, which quantifies the model's confidence for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Reinforcement Learning in Robotics