Towards Long-horizon Agentic Multimodal Search

Yifan Du; Zikang Liu; Jinbiao Peng; Jie Wu; Junyi Li; Jinyang Li; Wayne Xin Zhao; Ji-Rong Wen

arXiv:2604.12890·cs.CV·April 28, 2026

Towards Long-horizon Agentic Multimodal Search

Yifan Du, Zikang Liu, Jinbiao Peng, Jie Wu, Junyi Li, Jinyang Li, Wayne Xin Zhao, Ji-Rong Wen

PDF

1 Repo

TL;DR

This paper introduces LMM-Searcher, a multimodal deep search framework that manages long-horizon search tasks efficiently by externalizing visual data, enabling scalable, high-performance multimodal reasoning over extended interactions.

Contribution

It proposes a novel file-based visual representation and a tailored fetch-image tool, along with a data synthesis pipeline, to enhance long-horizon multimodal search capabilities.

Findings

01

Achieves state-of-the-art results on long-horizon benchmarks like MM-BrowseComp.

02

Successfully scales to 100-turn search horizons.

03

Demonstrates strong generalizability across different base models.

Abstract

Multimodal deep search agents have shown great potential in solving complex tasks by iteratively collecting textual and visual evidence. However, managing the heterogeneous information and high token costs associated with multimodal inputs over long horizons remains a critical challenge, as existing methods often suffer from context explosion or the loss of crucial visual signals. To address this, we propose a novel Long-horizon MultiModal deep search framework, named LMM-Searcher, centered on a file-based visual representation mechanism. By offloading visual assets to an external file system and mapping them to lightweight textual identifiers (UIDs), our approach mitigates context overhead while preserving multimodal information for future access. We equip the agent with a tailored fetch-image tool, enabling a progressive, on-demand visual loading strategy for active perception.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RUCAIBox/LMM-Searcher
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.