The Use of AI Tools to Develop and Validate Q-Matrices
Kevin Fan, Jacquelyn A. Bialo, Hongli Li

TL;DR
This study explores the potential of AI language models to assist in developing and validating Q-matrices for cognitive diagnostic modeling, comparing AI outputs with expert and validated matrices to assess accuracy.
Contribution
It provides an empirical comparison of AI-generated Q-matrices with validated and human-created matrices, highlighting AI's potential and current limitations in this task.
Findings
AI models showed substantial variation in agreement with the validated Q-matrix.
Google Gemini 2.5 Pro achieved the highest agreement (Kappa = 0.63), surpassing human experts.
Newer AI versions in 2026 showed decreased agreement, indicating evolving AI performance.
Abstract
Constructing a Q-matrix is a critical but labor-intensive step in cognitive diagnostic modeling (CDM). This study investigates whether AI tools (i.e., general language models) can support Q-matrix development by comparing AI-generated Q-matrices with a validated Q-matrix from Li and Suen (2013) for a reading comprehension test. In May 2025, multiple AI models were provided with the same training materials as human experts. Agreement among AI-generated Q-matrices, the validated Q-matrix, and human raters' Q-matrices was assessed using Cohen's kappa. Results showed substantial variation across AI models, with Google Gemini 2.5 Pro achieving the highest agreement (Kappa = 0.63) with the validated Q-matrix, exceeding that of all human experts. A follow-up analysis in January 2026 using newer AI versions, however, revealed lower agreement with the validated Q-matrix. Implications and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Human-Automation Interaction and Safety · Psychometric Methodologies and Testing
