ScholarSearch: Benchmarking Scholar Searching Ability of LLMs
Junting Zhou, Wang Li, Yiyan Liao, Nengyuan Zhang, Tingjia Miao, Zhihui Qi, Yuhan Wu, Tong Yang

TL;DR
ScholarSearch introduces a specialized benchmark dataset designed to evaluate the complex academic search and literature retrieval capabilities of Large Language Models across multiple disciplines, emphasizing real-world research scenarios.
Contribution
It is the first dataset tailored specifically for assessing LLMs' academic search abilities, addressing limitations of existing benchmarks by focusing on practical, difficult, and broad academic information retrieval tasks.
Findings
Enhanced evaluation of LLMs in academic search tasks
Identification of current model limitations in complex literature retrieval
Benchmark promotes development of more capable academic search models
Abstract
Large Language Models (LLMs)' search capabilities have garnered significant attention. Existing benchmarks, such as OpenAI's BrowseComp, primarily focus on general search scenarios and fail to adequately address the specific demands of academic search. These demands include deeper literature tracing and organization, professional support for academic databases, the ability to navigate long-tail academic knowledge, and ensuring academic rigor. Here, we proposed ScholarSearch, the first dataset specifically designed to evaluate the complex information retrieval capabilities of Large Language Models (LLMs) in academic research. ScholarSearch possesses the following key characteristics: Academic Practicality, where question content closely mirrors real academic learning and research environments, avoiding deliberately misleading models; High Difficulty, with answers that are challenging for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems · Topic Modeling · Artificial Intelligence in Healthcare and Education
MethodsFocus
