DeepSurvey-Bench: Evaluating Academic Value of Automatically Generated Scientific Survey
Guo-Biao Zhang, Ding-Yuan Liu, Da-Yi Wu, Tian Lan, Heyan Huang, Zhijing Wu, Xian-Ling Mao

TL;DR
DeepSurvey-Bench is a new benchmark designed to evaluate the true academic quality of automatically generated scientific surveys across multiple dimensions, addressing limitations of existing surface-level evaluation methods.
Contribution
It introduces a comprehensive evaluation framework with annotated datasets to assess the deep academic value of generated surveys, surpassing traditional surface-level metrics.
Findings
High consistency with human judgment in academic value assessment
Addresses key limitations of existing benchmarks
Provides a multi-dimensional evaluation criteria
Abstract
The rapid development of automated scientific survey generation technology has made it increasingly important to establish a comprehensive benchmark to evaluate the quality of generated surveys.Nearly all existing evaluation benchmarks rely on flawed selection criteria such as citation counts and structural coherence to select human-written surveys as the ground truth survey datasets, and then use surface-level metrics such as structural quality and reference relevance to evaluate generated surveys.However, these benchmarks have two key issues: (1) the ground truth survey datasets are unreliable because of a lack academic dimension annotations; (2) the evaluation metrics only focus on the surface quality of the survey such as logical coherence. Both issues lead to existing benchmarks cannot assess to evaluate their deep "academic value", such as the core research objectives and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurvey Methodology and Nonresponse · Mobile Crowdsensing and Crowdsourcing · Expert finding and Q&A systems
