Leveraging Large Language Models for Identifying Knowledge Components

Canwen Wang; Jionghao Lin; Kenneth R. Koedinger

arXiv:2511.09935·cs.CL·November 14, 2025

Leveraging Large Language Models for Identifying Knowledge Components

Canwen Wang, Jionghao Lin, Kenneth R. Koedinger

PDF

Open Access

TL;DR

This paper explores using large language models to automate the identification of knowledge components in educational content, addressing redundancy and performance issues through semantic merging techniques.

Contribution

It introduces a novel semantic merging method to improve LLM-generated knowledge components, enhancing accuracy and reducing redundancy.

Findings

01

Scaling LLMs alone yields subpar performance compared to expert models.

02

Semantic merging significantly reduces redundant KCs.

03

Optimized thresholding improves RMSE and reduces KC count.

Abstract

Knowledge Components (KCs) are foundational to adaptive learning systems, but their manual identification by domain experts is a significant bottleneck. While Large Language Models (LLMs) offer a promising avenue for automating this process, prior research has been limited to small datasets and has been shown to produce superfluous, redundant KC labels. This study addresses these limitations by first scaling a "simulated textbook" LLM prompting strategy (using GPT-4o-mini) to a larger dataset of 646 multiple-choice questions. We found that this initial automated approach performed significantly worse than an expert-designed KC model (RMSE 0.4285 vs. 0.4206) and generated an excessive number of KCs (569 vs. 101). To address the issue of redundancy, we proposed and evaluated a novel method for merging semantically similar KC labels based on their cosine similarity. This merging strategy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Text Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning