DeepSurvey-Bench: Evaluating Academic Value of Automatically Generated Scientific Survey

Guo-Biao Zhang; Ding-Yuan Liu; Da-Yi Wu; Tian Lan; Heyan Huang; Zhijing Wu; Xian-Ling Mao

arXiv:2601.15307·cs.AI·January 23, 2026

DeepSurvey-Bench: Evaluating Academic Value of Automatically Generated Scientific Survey

Guo-Biao Zhang, Ding-Yuan Liu, Da-Yi Wu, Tian Lan, Heyan Huang, Zhijing Wu, Xian-Ling Mao

PDF

Open Access

TL;DR

DeepSurvey-Bench is a new benchmark designed to evaluate the true academic quality of automatically generated scientific surveys across multiple dimensions, addressing limitations of existing surface-level evaluation methods.

Contribution

It introduces a comprehensive evaluation framework with annotated datasets to assess the deep academic value of generated surveys, surpassing traditional surface-level metrics.

Findings

01

High consistency with human judgment in academic value assessment

02

Addresses key limitations of existing benchmarks

03

Provides a multi-dimensional evaluation criteria

Abstract

The rapid development of automated scientific survey generation technology has made it increasingly important to establish a comprehensive benchmark to evaluate the quality of generated surveys.Nearly all existing evaluation benchmarks rely on flawed selection criteria such as citation counts and structural coherence to select human-written surveys as the ground truth survey datasets, and then use surface-level metrics such as structural quality and reference relevance to evaluate generated surveys.However, these benchmarks have two key issues: (1) the ground truth survey datasets are unreliable because of a lack academic dimension annotations; (2) the evaluation metrics only focus on the surface quality of the survey such as logical coherence. Both issues lead to existing benchmarks cannot assess to evaluate their deep "academic value", such as the core research objectives and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSurvey Methodology and Nonresponse · Mobile Crowdsensing and Crowdsourcing · Expert finding and Q&A systems