The Use of AI Tools to Develop and Validate Q-Matrices

Kevin Fan; Jacquelyn A. Bialo; Hongli Li

arXiv:2602.08796·cs.AI·February 10, 2026

The Use of AI Tools to Develop and Validate Q-Matrices

Kevin Fan, Jacquelyn A. Bialo, Hongli Li

PDF

Open Access

TL;DR

This study explores the potential of AI language models to assist in developing and validating Q-matrices for cognitive diagnostic modeling, comparing AI outputs with expert and validated matrices to assess accuracy.

Contribution

It provides an empirical comparison of AI-generated Q-matrices with validated and human-created matrices, highlighting AI's potential and current limitations in this task.

Findings

01

AI models showed substantial variation in agreement with the validated Q-matrix.

02

Google Gemini 2.5 Pro achieved the highest agreement (Kappa = 0.63), surpassing human experts.

03

Newer AI versions in 2026 showed decreased agreement, indicating evolving AI performance.

Abstract

Constructing a Q-matrix is a critical but labor-intensive step in cognitive diagnostic modeling (CDM). This study investigates whether AI tools (i.e., general language models) can support Q-matrix development by comparing AI-generated Q-matrices with a validated Q-matrix from Li and Suen (2013) for a reading comprehension test. In May 2025, multiple AI models were provided with the same training materials as human experts. Agreement among AI-generated Q-matrices, the validated Q-matrix, and human raters' Q-matrices was assessed using Cohen's kappa. Results showed substantial variation across AI models, with Google Gemini 2.5 Pro achieving the highest agreement (Kappa = 0.63) with the validated Q-matrix, exceeding that of all human experts. A follow-up analysis in January 2026 using newer AI versions, however, revealed lower agreement with the validated Q-matrix. Implications and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Human-Automation Interaction and Safety · Psychometric Methodologies and Testing