MCQ Difficulty Prediction via Modeling Learner Heterogeneity Using Data-Driven Cognitive Profiling

Dhriti Krishnan; Jaromir Savelka

arXiv:2605.16290·cs.CY·May 19, 2026

MCQ Difficulty Prediction via Modeling Learner Heterogeneity Using Data-Driven Cognitive Profiling

Dhriti Krishnan, Jaromir Savelka

PDF

TL;DR

This paper introduces a data-driven cognitive profiling approach using student interaction data and latent class analysis to improve MCQ difficulty prediction, capturing learner heterogeneity.

Contribution

It replaces traditional ability sampling with behavioral personas and conditions LLMs to better model student heterogeneity in difficulty prediction.

Findings

01

Improved prediction accuracy (MSE: 0.367 to 0.274)

02

Personas are interpretable and provide insights into item difficulty

03

Method outperforms recent baseline models

Abstract

Predicting the difficulty of multiple-choice questions (MCQs) is important for effective assessment, yet current methods typically assume a unimodal student ability distribution, overlooking the heterogeneous nature of student misconceptions. We propose a persona-driven framework that replaces theoretical ability sampling with data-driven cognitive profiling. Using student interactions from the EEDI dataset, we identify behavioral personas via latent class analysis (LCA), then condition a large language model (LLM) to simulate response distributions for each persona. These signals are aggregated with topic context and fed into a Ridge Regression model to predict the item response theory (IRT) difficulty parameter. With five-fold cross-validation, our method improves over a recent baseline (MSE: 0.367 to 0.274; R2: 0.525 to 0.686). The discovered personas are interpretable and offer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.