DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking

Tian Lan; Bin Zhu; Qianghuai Jia; Junyang Ren; Haijun Li; Longyue Wang; Zhao Xu; Weihua Luo; Kaifu Zhang

arXiv:2510.20168·cs.CL·October 24, 2025

DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking

Tian Lan, Bin Zhu, Qianghuai Jia, Junyang Ren, Haijun Li, Longyue Wang, Zhao Xu, Weihua Luo, Kaifu Zhang

PDF

Open Access 1 Datasets

TL;DR

DeepWideSearch introduces a new benchmark to evaluate agents' ability to perform both deep reasoning and wide-scale information retrieval, revealing current limitations and guiding future improvements in agent architectures.

Contribution

This paper presents DeepWideSearch, the first benchmark explicitly designed to evaluate the integration of depth and width in agentic information seeking tasks.

Findings

01

State-of-the-art agents achieve only 2.39% success rate on the benchmark.

02

Identified four key failure modes: lack of reflection, overreliance on internal knowledge, insufficient retrieval, context overflow.

03

Benchmark is publicly released to foster future research.

Abstract

Current search agents fundamentally lack the ability to simultaneously perform \textit{deep} reasoning over multi-hop retrieval and \textit{wide}-scale information collection-a critical deficiency for real-world applications like comprehensive market analysis and business development. To bridge this gap, we introduce DeepWideSearch, the first benchmark explicitly designed to evaluate agents to integrate depth and width in information seeking. In DeepWideSearch, agents must process a large volume of data, each requiring deep reasoning over multi-hop retrieval paths. Specifically, we propose two methods to converse established datasets, resulting in a curated collection of 220 questions spanning 15 diverse domains. Extensive experiments demonstrate that even state-of-the-art agents achieve only 2.39% average success rate on DeepWideSearch, highlighting the substantial challenge of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

AIDC-AI/DeepWideSearch
dataset· 646 dl
646 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Information Retrieval and Search Behavior · Topic Modeling