InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search

Bohan Hou; Jiuning Gu; Jiayan Guo; Ronghao Dang; Sicong Leng; Xin Li; Xuemeng Song; Jianfei Yang

arXiv:2605.07510·cs.CV·May 11, 2026

InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search

Bohan Hou, Jiuning Gu, Jiayan Guo, Ronghao Dang, Sicong Leng, Xin Li, Xuemeng Song, Jianfei Yang

PDF

1 Repo

TL;DR

InterLV-Search introduces a comprehensive benchmark for evaluating interleaved multimodal agentic search, emphasizing visual evidence integration and search trajectory management, revealing current system limitations.

Contribution

It presents a new benchmark with diverse levels and a standardized agent for evaluating interleaved language-vision search tasks, including open-web scenarios.

Findings

01

Current models achieve below 50% accuracy on the benchmark.

02

Visual evidence seeking and multimodal evidence integration remain challenging.

03

The benchmark includes automated and human-supervised data pipelines.

Abstract

Existing benchmarks for multimodal agentic search evaluate multimodal search and visual browsing, but visual evidence is either confined to the input or treated as an answer endpoint rather than part of an interleaved search trajectory. We introduce \textbf{InterLV-Search}, a benchmark for Interleaved Language-Vision Agentic Search, in which textual and visual evidence is repeatedly used to condition later search. It contains 2,061 examples across three levels: active visual evidence seeking, controlled offline interleaved multimodal search, and open-web interleaved multimodal search. Beyond existing benchmarks, it also includes multimodal multi-branch samples that involve comparison between multiple entities during the evidence search. We construct Level 1 and Level 2 with automated pipelines and Level 3 with a machine-led, human-supervised open-web pipeline. We further provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hbhalpha/InterLV-Search-Bench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.