DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking
Tian Lan, Bin Zhu, Qianghuai Jia, Junyang Ren, Haijun Li, Longyue Wang, Zhao Xu, Weihua Luo, Kaifu Zhang

TL;DR
DeepWideSearch introduces a new benchmark to evaluate agents' ability to perform both deep reasoning and wide-scale information retrieval, revealing current limitations and guiding future improvements in agent architectures.
Contribution
This paper presents DeepWideSearch, the first benchmark explicitly designed to evaluate the integration of depth and width in agentic information seeking tasks.
Findings
State-of-the-art agents achieve only 2.39% success rate on the benchmark.
Identified four key failure modes: lack of reflection, overreliance on internal knowledge, insufficient retrieval, context overflow.
Benchmark is publicly released to foster future research.
Abstract
Current search agents fundamentally lack the ability to simultaneously perform \textit{deep} reasoning over multi-hop retrieval and \textit{wide}-scale information collection-a critical deficiency for real-world applications like comprehensive market analysis and business development. To bridge this gap, we introduce DeepWideSearch, the first benchmark explicitly designed to evaluate agents to integrate depth and width in information seeking. In DeepWideSearch, agents must process a large volume of data, each requiring deep reasoning over multi-hop retrieval paths. Specifically, we propose two methods to converse established datasets, resulting in a curated collection of 220 questions spanning 15 diverse domains. Extensive experiments demonstrate that even state-of-the-art agents achieve only 2.39% average success rate on DeepWideSearch, highlighting the substantial challenge of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Information Retrieval and Search Behavior · Topic Modeling
