DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent
Tongzhou Wu, Yuhao Wang, Xinyu Ma, Xiuqiang He, Shuaiqiang Wang, Dawei Yin, Xiangyu Zhao

TL;DR
This paper introduces DeepResearch-9K, a large-scale challenging dataset for deep-research agents, along with an open-source training framework DeepResearch-R1, to advance multi-step web exploration and question answering capabilities.
Contribution
The paper presents a novel large-scale dataset and an open-source training framework specifically designed for deep-research agents, addressing key bottlenecks in data and tools.
Findings
Agents trained on DeepResearch-9K achieve state-of-the-art results.
DeepResearch-R1 supports multi-turn web interactions and various reinforcement learning approaches.
The dataset includes high-quality search trajectories and verifiable answers.
Abstract
Deep-research agents are capable of executing multi-step web exploration, targeted retrieval, and sophisticated question answering. Despite their powerful capabilities, deep-research agents face two critical bottlenecks: (1) the lack of large-scale, challenging datasets with real-world difficulty, and (2) the absence of accessible, open-source frameworks for data synthesis and agent training. To bridge these gaps, we first construct DeepResearch-9K, a large-scale challenging dataset specifically designed for deep-research scenarios built from open-source multi-hop question-answering (QA) datasets via a low-cost autonomous pipeline. Notably, it consists of (1) 9000 questions spanning three difficulty levels from L1 to L3 (2) high-quality search trajectories with reasoning chains from Tongyi-DeepResearch-30B-A3B, a state-of-the-art deep-research agent, and (3) verifiable answers.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Expert finding and Q&A systems
