ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry
Tianze Xu, Pengrui Lu, Lyumanshan Ye, Xiangkun Hu, Pengfei Liu

TL;DR
ResearcherBench is a new benchmark designed to evaluate advanced AI research systems on scientific questions, combining expert rubric and factual assessments to measure insight quality, faithfulness, and groundedness.
Contribution
It introduces the first comprehensive benchmark for assessing deep AI research systems on scientific inquiry, including a curated dataset and dual evaluation framework.
Findings
OpenAI Deep Research outperforms others in open-ended questions
Gemini Deep Research shows strong insight generation
Benchmark reveals gaps in current AI research systems
Abstract
The emergence of deep research systems presents significant capabilities in problem-solving, extending from basic queries to sophisticated research tasks. However, existing benchmarks primarily evaluate these systems as agents for web retrieval and report generation, overlooking their potential to discover novel insights on the frontiers of scientific research. To address this gap, we introduce ResearcherBench, the first benchmark focused on evaluating the capabilities of these advanced, agentic systems - which we refer to as Deep AI Research Systems (DARS) - on frontier AI scientific questions. We compiled a dataset of 65 research questions expertly selected from real-world scientific scenarios such as laboratory discussions and interviews, spanning 35 different AI subjects and categorized into three types: technical details, literature review, and open consulting. Our dual evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Topic Modeling · Artificial Intelligence in Healthcare and Education
