Evaluating the Knowledge Dependency of Questions
Hyeongdon Moon, Yoonseok Yang, Jamin Shin, Hangyeol Yu, Seunghyun Lee,, Myeongho Jeong, Juneyoung Park, Minsam Kim, Seungtaek Choi

TL;DR
This paper introduces a novel evaluation metric, KDA, for assessing the educational value of automatically generated MCQs by measuring their answerability based on knowledge of the target fact, improving over traditional n-gram metrics.
Contribution
The paper proposes KDA and its automatic variants KDA_disc and KDA_cont, which better evaluate MCQ quality by focusing on knowledge-dependent answerability rather than surface similarity.
Findings
KDA_disc and KDA_cont strongly correlate with human judgments.
KDA metrics improve the prediction of MCQ quality when combined with traditional metrics.
Human studies validate the effectiveness of KDA in classroom settings.
Abstract
The automatic generation of Multiple Choice Questions (MCQ) has the potential to reduce the time educators spend on student assessment significantly. However, existing evaluation metrics for MCQ generation, such as BLEU, ROUGE, and METEOR, focus on the n-gram based similarity of the generated MCQ to the gold sample in the dataset and disregard their educational value. They fail to evaluate the MCQ's ability to assess the student's knowledge of the corresponding target fact. To tackle this issue, we propose a novel automatic evaluation metric, coined Knowledge Dependent Answerability (KDA), which measures the MCQ's answerability given knowledge of the target fact. Specifically, we first show how to measure KDA based on student responses from a human survey. Then, we propose two automatic evaluation metrics, KDA_disc and KDA_cont, that approximate KDA by leveraging pre-trained language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Educational Technology and Assessment · Natural Language Processing Techniques
Methodsfail
