EduResearchBench: A Hierarchical Atomic Task Decomposition Benchmark for Full-Lifecycle Educational Research
Houping Yue, Zixiang Di, Mei Jiang, Bingdong Li, Hao Hao, Yu Song, Bo Jiang, Aimin Zhou

TL;DR
EduResearchBench introduces a hierarchical task decomposition framework and an evaluation platform for assessing LLMs in scholarly educational writing, emphasizing fine-grained diagnostics and curriculum learning to improve model performance.
Contribution
The paper presents EduResearchBench and HATD, a novel hierarchical framework for detailed evaluation of LLMs in academic research workflows, along with a curriculum learning strategy and a specialized model, EduWrite.
Findings
EduWrite outperforms larger models on core metrics.
Hierarchical task decomposition enables fine-grained assessment.
Data quality and staged training are more impactful than size.
Abstract
While Large Language Models (LLMs) are reshaping the paradigm of AI for Social Science (AI4SS), rigorously evaluating their capabilities in scholarly writing remains a major challenge. Existing benchmarks largely emphasize single-shot, monolithic generation and thus lack the fine-grained assessments required to reflect complex academic research workflows. To fill this gap, we introduce EduResearchBench, the first comprehensive evaluation platform dedicated to educational academic writing. EduResearchBench is built upon our Hierarchical Atomic Task Decomposition (HATD) framework, which decomposes an end-to-end research workflow into six specialized research modules (e.g., Quantitative Analysis, Qualitative Research, and Policy Research) spanning 24 fine-grained atomic tasks. This taxonomy enables an automated evaluation pipeline that mitigates a key limitation of holistic scoring, where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Machine Learning in Materials Science
