Super Research: Answering Highly Complex Questions with Large Language Models through Super Deep and Super Wide Research

Yubo Dong; Nianhao You; Yuxuan Hou; Zixun Sun; Yue Zhang; Liang Zhang; Siyuan Zhao; Hehe Fan

arXiv:2603.00582·cs.CL·March 4, 2026

Super Research: Answering Highly Complex Questions with Large Language Models through Super Deep and Super Wide Research

Yubo Dong, Nianhao You, Yuxuan Hou, Zixun Sun, Yue Zhang, Liang Zhang, Siyuan Zhao, Hehe Fan

PDF

Open Access

TL;DR

Super Research introduces a comprehensive framework and benchmark for evaluating large language models on highly complex research questions requiring extensive evidence gathering, planning, and synthesis, serving as a robustness test for research capabilities.

Contribution

The paper presents a novel task, benchmark, and evaluation protocol for assessing LLMs on complex research tasks involving structured decomposition, wide retrieval, and deep investigation.

Findings

01

Super Research benchmark includes 300 expert-written questions across domains.

02

Models require over 100 retrieval steps and 1,000 web pages to answer complex questions.

03

Proficiency in Super Research correlates with overall research competence of LLMs.

Abstract

While Large Language Models (LLMs) have demonstrated proficiency in Deep Research or Wide Search, their capacity to solve highly complex questions-those requiring long-horizon planning, massive evidence gathering, and synthesis across heterogeneous sources-remains largely unexplored. We introduce Super Research, a task for complex autonomous research tasks that integrates (i) structured decomposition into a research plan, (ii) super wide retrieval for diverse perspectives, and (iii) super deep investigation to resolve uncertainties through iterative queries. To evaluate this capability, we curated a benchmark of 300 expert-written questions across diverse domains, each requiring up to 100+ retrieval steps and 1,000+ web pages to reconcile conflicting evidence. Super Research produces verifiable reports with fine-grained citations and intermediate artifacts (e.g., outlines and tables) to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education · Topic Modeling