Assessing Distractors in Multiple-Choice Tests
Vatsal Raina, Adian Liusie, Mark Gales

TL;DR
This paper introduces automated metrics to evaluate distractors in multiple-choice reading comprehension tests, focusing on incorrectness, plausibility, and diversity to improve assessment quality.
Contribution
It proposes novel automated assessment metrics for distractor quality, integrating classification, probability, and embedding-based diversity measures.
Findings
The proposed metrics effectively evaluate distractor quality.
Plausibility correlates with ChatGPT's interpretations.
Diversity assessment improves distractor selection.
Abstract
Multiple-choice tests are a common approach for assessing candidates' comprehension skills. Standard multiple-choice reading comprehension exams require candidates to select the correct answer option from a discrete set based on a question in relation to a contextual passage. For appropriate assessment, the distractor answer options must by definition be incorrect but plausible and diverse. However, generating good quality distractors satisfying these criteria is a challenging task for content creators. We propose automated assessment metrics for the quality of distractors in multiple-choice reading comprehension tests. Specifically, we define quality in terms of the incorrectness, plausibility and diversity of the distractor options. We assess incorrectness using the classification ability of a binary multiple-choice reading comprehension system. Plausibility is assessed by considering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning
MethodsSparse Evolutionary Training
