A Framework for Human-AI Q-Matrix Refinement: A NeuralCDM Evaluation
Ying Zhang, Ningxi Cheng, Yizhu Gao, Hongmei Li, Lehong Shi, Nicholas Young, Geng Yuan, Xiaoming Zhai

TL;DR
This paper introduces a human-AI collaborative framework for refining Q-matrices in assessments using large language models and NeuralCDM, improving model fit and enabling privacy-preserving deployment.
Contribution
It presents a novel framework combining LLMs and NeuralCDM for efficient, empirical Q-matrix refinement, surpassing expert baseline performance.
Findings
LLM-generated Q-matrices can outperform expert-crafted ones in model fit.
Locally deployed LLMs achieve performance comparable to cloud models.
Iterative refinement enhances the explanatory power of Q-matrices.
Abstract
Q-matrices are a cornerstone of theory-driven assessment and learning analytics, making item demands and students' underlying knowledge components and misconceptions explicit and actionable. However, Q-matrices are typically crafted by experts, making them time-consuming to build, prone to subjectivity, and difficult to validate empirically. We propose a framework for human-AI Q-matrix refinement in which large language models (LLMs) generate candidate Q-matrices using structured, misconception-aware prompting, and NeuralCDM provides an empirical evaluation layer to compare candidates based on how well they explain student response data. We apply the framework to a thermodynamics assessment dataset and benchmark locally deployed LLMs against cloud-served models. Results show that iteratively refined LLM-generated Q-matrices can exceed expert-baseline model fit (AUC 0.780 vs. 0.717), and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
