Automated Generation and Tagging of Knowledge Components from   Multiple-Choice Questions

Steven Moore; Robin Schmucker; Tom Mitchell; John Stamper

arXiv:2405.20526·cs.AI·June 3, 2024

Automated Generation and Tagging of Knowledge Components from Multiple-Choice Questions

Steven Moore, Robin Schmucker, Tom Mitchell, John Stamper

PDF

1 Repo

TL;DR

This paper explores using GPT-4 to automatically generate and tag Knowledge Components for multiple-choice questions in Chemistry and E-Learning, showing promising accuracy and human preference for LLM-generated KCs.

Contribution

It introduces an LLM-based method for automated KC generation and a clustering algorithm for grouping questions by KCs, reducing manual effort and domain expertise needed.

Findings

01

LLM achieved 56% match rate for Chemistry KCs and 35% for E-Learning KCs.

02

Human evaluators preferred LLM-generated KCs over human ones in about two-thirds of cases.

03

The clustering algorithm effectively grouped questions by underlying KCs without explicit labels.

Abstract

Knowledge Components (KCs) linked to assessments enhance the measurement of student learning, enrich analytics, and facilitate adaptivity. However, generating and linking KCs to assessment items requires significant effort and domain-specific knowledge. To streamline this process for higher-education courses, we employed GPT-4 to generate KCs for multiple-choice questions (MCQs) in Chemistry and E-Learning. We analyzed discrepancies between the KCs generated by the Large Language Model (LLM) and those made by humans through evaluation from three domain experts in each subject area. This evaluation aimed to determine whether, in instances of non-matching KCs, evaluators showed a preference for the LLM-generated KCs over their human-created counterparts. We also developed an ontology induction algorithm to cluster questions that assess similar KCs based on their content. Our most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stevenjamesmoore/learningatscale24
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections