Scalable Text-Embedding-informed Cognitive Diagnosis of Large Language Models

Jia Liu; Zhiyu Xu; and Yuqi Gu

arXiv:2603.14676·stat.ME·March 17, 2026

Scalable Text-Embedding-informed Cognitive Diagnosis of Large Language Models

Jia Liu, Zhiyu Xu, and Yuqi Gu

PDF

Open Access

TL;DR

This paper introduces a scalable, fine-grained evaluation method for large language models using cognitive diagnosis models, incorporating textual information to diagnose strengths and weaknesses across numerous capabilities.

Contribution

It adapts psychometric cognitive diagnosis models for large-scale LLM evaluation, enabling detailed capability profiling with textual priors and efficient estimation algorithms.

Findings

01

Accurate mastery profile estimation demonstrated in simulations

02

Effective application to MATH Level 5 benchmark

03

Uncovered detailed LLM strengths and weaknesses

Abstract

Large language models (LLMs) have achieved remarkable performance on diverse benchmarks, yet existing evaluation practices largely rely on coarse summary metrics that obscure underlying reasoning abilities. In this work, we propose novel methodologies to adapt cognitive diagnosis models (CDMs) in psychometrics to LLM evaluation, enabling fine-grained diagnosis via multidimensional discrete capability profiles and interpretable characterizations of LLM strengths and weaknesses. First, to enable CDM-based evaluation at benchmark scale (more than 1000 items), we propose a scalable method that jointly estimates LLM mastery profiles and the item-attribute Q-matrix, addressing key challenges posed by high-dimensional latent attributes (K > 20), large item pools, and the prohibitive computational cost of existing marginal maximum likelihood-based estimation. Second, we incorporate item-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPsychometric Methodologies and Testing · Topic Modeling · Mental Health via Writing