ScholarSearch: Benchmarking Scholar Searching Ability of LLMs

Junting Zhou; Wang Li; Yiyan Liao; Nengyuan Zhang; Tingjia Miao; Zhihui Qi; Yuhan Wu; Tong Yang

arXiv:2506.13784·cs.IR·June 23, 2025

ScholarSearch: Benchmarking Scholar Searching Ability of LLMs

Junting Zhou, Wang Li, Yiyan Liao, Nengyuan Zhang, Tingjia Miao, Zhihui Qi, Yuhan Wu, Tong Yang

PDF

Open Access 1 Datasets

TL;DR

ScholarSearch introduces a specialized benchmark dataset designed to evaluate the complex academic search and literature retrieval capabilities of Large Language Models across multiple disciplines, emphasizing real-world research scenarios.

Contribution

It is the first dataset tailored specifically for assessing LLMs' academic search abilities, addressing limitations of existing benchmarks by focusing on practical, difficult, and broad academic information retrieval tasks.

Findings

01

Enhanced evaluation of LLMs in academic search tasks

02

Identification of current model limitations in complex literature retrieval

03

Benchmark promotes development of more capable academic search models

Abstract

Large Language Models (LLMs)' search capabilities have garnered significant attention. Existing benchmarks, such as OpenAI's BrowseComp, primarily focus on general search scenarios and fail to adequately address the specific demands of academic search. These demands include deeper literature tracing and organization, professional support for academic databases, the ability to navigate long-tail academic knowledge, and ensuring academic rigor. Here, we proposed ScholarSearch, the first dataset specifically designed to evaluate the complex information retrieval capabilities of Large Language Models (LLMs) in academic research. ScholarSearch possesses the following key characteristics: Academic Practicality, where question content closely mirrors real academic learning and research environments, avoiding deliberately misleading models; High Difficulty, with answers that are challenging for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

PKU-DS-LAB/ScholarSearch
dataset· 35 dl
35 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExpert finding and Q&A systems · Topic Modeling · Artificial Intelligence in Healthcare and Education

MethodsFocus