HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

Guankai Li; Jiabin Chen; Yi Xu; Xichen Zhang; Yuan Lu

arXiv:2605.07177·cs.LG·May 12, 2026

HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

Guankai Li, Jiabin Chen, Yi Xu, Xichen Zhang, Yuan Lu

PDF

1 Repo

TL;DR

HyperEyes introduces a parallel multimodal search agent that fuses visual grounding and retrieval into atomic actions, optimizing for efficiency and accuracy through dual-level reinforcement learning.

Contribution

It presents a novel dual-grained reinforcement learning framework and a new benchmark for evaluating search efficiency alongside accuracy.

Findings

01

HyperEyes-30B outperforms comparable agents by 9.9% in accuracy.

02

It achieves 5.3x fewer tool-call rounds on average.

03

The framework effectively balances search capability and inference efficiency.

Abstract

Existing multimodal search agents process target entities sequentially, issuing one tool call per entity and accumulating redundant interaction rounds whenever a query decomposes into independent sub-retrievals. We argue that effective multimodal agents should search wider rather than longer: dispatching multiple grounded queries concurrently within a round. To this end, we present HyperEyes, a parallel multimodal search agent that fuses visual grounding and retrieval into a single atomic action, enabling concurrent search across multiple entities while treating inference efficiency as a first-class training objective. HyperEyes is trained in two stages. For cold-start supervision, we develop a Parallel-Amenable Data Synthesis Pipeline covering visual multi-entity and textual multi-constraint queries, curating efficiency-oriented trajectories via Progressive Rejection Sampling. Building…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deepexperience/HyperEyes
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.