MCQ Difficulty Prediction via Modeling Learner Heterogeneity Using Data-Driven Cognitive Profiling
Dhriti Krishnan, Jaromir Savelka

TL;DR
This paper introduces a data-driven cognitive profiling approach using student interaction data and latent class analysis to improve MCQ difficulty prediction, capturing learner heterogeneity.
Contribution
It replaces traditional ability sampling with behavioral personas and conditions LLMs to better model student heterogeneity in difficulty prediction.
Findings
Improved prediction accuracy (MSE: 0.367 to 0.274)
Personas are interpretable and provide insights into item difficulty
Method outperforms recent baseline models
Abstract
Predicting the difficulty of multiple-choice questions (MCQs) is important for effective assessment, yet current methods typically assume a unimodal student ability distribution, overlooking the heterogeneous nature of student misconceptions. We propose a persona-driven framework that replaces theoretical ability sampling with data-driven cognitive profiling. Using student interactions from the EEDI dataset, we identify behavioral personas via latent class analysis (LCA), then condition a large language model (LLM) to simulate response distributions for each persona. These signals are aggregated with topic context and fed into a Ridge Regression model to predict the item response theory (IRT) difficulty parameter. With five-fold cross-validation, our method improves over a recent baseline (MSE: 0.367 to 0.274; R2: 0.525 to 0.686). The discovered personas are interpretable and offer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
