A Framework for Human-AI Q-Matrix Refinement: A NeuralCDM Evaluation

Ying Zhang; Ningxi Cheng; Yizhu Gao; Hongmei Li; Lehong Shi; Nicholas Young; Geng Yuan; Xiaoming Zhai

arXiv:2604.16398·cs.CY·April 21, 2026

A Framework for Human-AI Q-Matrix Refinement: A NeuralCDM Evaluation

Ying Zhang, Ningxi Cheng, Yizhu Gao, Hongmei Li, Lehong Shi, Nicholas Young, Geng Yuan, Xiaoming Zhai

PDF

TL;DR

This paper introduces a human-AI collaborative framework for refining Q-matrices in assessments using large language models and NeuralCDM, improving model fit and enabling privacy-preserving deployment.

Contribution

It presents a novel framework combining LLMs and NeuralCDM for efficient, empirical Q-matrix refinement, surpassing expert baseline performance.

Findings

01

LLM-generated Q-matrices can outperform expert-crafted ones in model fit.

02

Locally deployed LLMs achieve performance comparable to cloud models.

03

Iterative refinement enhances the explanatory power of Q-matrices.

Abstract

Q-matrices are a cornerstone of theory-driven assessment and learning analytics, making item demands and students' underlying knowledge components and misconceptions explicit and actionable. However, Q-matrices are typically crafted by experts, making them time-consuming to build, prone to subjectivity, and difficult to validate empirically. We propose a framework for human-AI Q-matrix refinement in which large language models (LLMs) generate candidate Q-matrices using structured, misconception-aware prompting, and NeuralCDM provides an empirical evaluation layer to compare candidates based on how well they explain student response data. We apply the framework to a thermodynamics assessment dataset and benchmark locally deployed LLMs against cloud-served models. Results show that iteratively refined LLM-generated Q-matrices can exceed expert-baseline model fit (AUC 0.780 vs. 0.717), and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.