ErrEval: Error-Aware Evaluation for Question Generation through Explicit Diagnostics

Weiping Fu; Bifan Wei; Jingyi Hao; Yushun Zhang; Jian Zhang; Jiaxin Wang; Bo Li; Yu He; Lingling Zhang; Jun Liu

arXiv:2601.10406·cs.AI·January 16, 2026

ErrEval: Error-Aware Evaluation for Question Generation through Explicit Diagnostics

Weiping Fu, Bifan Wei, Jingyi Hao, Yushun Zhang, Jian Zhang, Jiaxin Wang, Bo Li, Yu He, Lingling Zhang, Jun Liu

PDF

Open Access

TL;DR

ErrEval introduces an error-aware evaluation framework for question generation that explicitly diagnoses common errors to improve evaluation accuracy and alignment with human judgments.

Contribution

The paper presents ErrEval, a novel framework that incorporates explicit error diagnostics into QG evaluation, addressing limitations of existing black-box methods.

Findings

01

ErrEval improves correlation with human judgments.

02

Explicit diagnostics mitigate overestimation of low-quality questions.

03

Demonstrated effectiveness on three benchmark datasets.

Abstract

Automatic Question Generation (QG) often produces outputs with critical defects, such as factual hallucinations and answer mismatches. However, existing evaluation methods, including LLM-based evaluators, mainly adopt a black-box and holistic paradigm without explicit error modeling, leading to the neglect of such defects and overestimation of question quality. To address this issue, we propose ErrEval, a flexible and Error-aware Evaluation framework that enhances QG evaluation through explicit error diagnostics. Specifically, ErrEval reformulates evaluation as a two-stage process of error diagnosis followed by informed scoring. At the first stage, a lightweight plug-and-play Error Identifier detects and categorizes common errors across structural, linguistic, and content-related aspects. These diagnostic signals are then incorporated as explicit evidence to guide LLM evaluators toward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Expert finding and Q&A systems