ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents
Hao Kang, Chenyan Xiong

TL;DR
ResearchArena is a benchmark designed to evaluate large language models' ability to conduct academic surveys by simulating the research process in information discovery, selection, and organization, highlighting current limitations and future opportunities.
Contribution
This paper introduces ResearchArena, a novel benchmark for assessing LLMs' research capabilities, including a comprehensive offline environment and evaluation framework for academic survey tasks.
Findings
LLMs underperform compared to keyword-based retrieval methods
Recent reasoning models like DeepSeek-R1 show improved zero-shot performance
Significant opportunities exist for advancing LLMs in autonomous research
Abstract
Large language models (LLMs) excel across many natural language processing tasks but face challenges in domain-specific, analytical tasks such as conducting research surveys. This study introduces ResearchArena, a benchmark designed to evaluate LLMs' capabilities in conducting academic surveys -- a foundational step in academic research. ResearchArena models the process in three stages: (1) information discovery, identifying relevant literature; (2) information selection, evaluating papers' relevance and impact; and (3) information organization, structuring knowledge into hierarchical frameworks such as mind-maps. Notably, mind-map construction is treated as a bonus task, reflecting its supplementary role in survey-writing. To support these evaluations, we construct an offline environment of 12M full-text academic papers and 7.9K survey papers. To ensure ethical compliance, we do not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLegal Education and Practice Innovations · Artificial Intelligence in Law
