An Automatic Question Usability Evaluation Toolkit
Steven Moore, Eamon Costello, Huy A. Nguyen, John Stamper

TL;DR
SAQUET is an open-source, AI-powered toolkit that automates comprehensive evaluation of multiple-choice questions, significantly improving the detection of question flaws beyond traditional metrics.
Contribution
Introduces SAQUET, a novel automated tool leveraging large language models and the IWF rubric for detailed MCQ quality assessment.
Findings
Over 94% accuracy in flaw detection compared to human evaluators
Effectively distinguishes flawed from flawless questions across multiple domains
Outperforms traditional automated evaluation metrics
Abstract
Evaluating multiple-choice questions (MCQs) involves either labor intensive human assessments or automated methods that prioritize readability, often overlooking deeper question design flaws. To address this issue, we introduce the Scalable Automatic Question Usability Evaluation Toolkit (SAQUET), an open-source tool that leverages the Item-Writing Flaws (IWF) rubric for a comprehensive and automated quality evaluation of MCQs. By harnessing the latest in large language models such as GPT-4, advanced word embeddings, and Transformers designed to analyze textual complexity, SAQUET effectively pinpoints and assesses a wide array of flaws in MCQs. We first demonstrate the discrepancy between commonly used automated evaluation metrics and the human assessment of MCQ quality. Then we evaluate SAQUET on a diverse dataset of MCQs across the five domains of Chemistry, Statistics, Computer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Intelligent Tutoring Systems and Adaptive Learning · Second Language Acquisition and Learning
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections
