Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of   Model Uncertainty for Question Difficulty Estimation

Leonidas Zotos; Hedderik van Rijn; Malvina Nissim

arXiv:2412.11831·cs.CL·April 21, 2025

Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation

Leonidas Zotos, Hedderik van Rijn, Malvina Nissim

PDF

Open Access

TL;DR

This paper proposes using large language model uncertainty to automatically estimate question difficulty in multiple-choice assessments, achieving state-of-the-art results and offering a promising alternative to human evaluation.

Contribution

It introduces a novel method leveraging model uncertainty features with textual data for question difficulty prediction, outperforming previous approaches.

Findings

01

Uncertainty features significantly improve difficulty prediction.

02

Model achieves state-of-the-art results on USMLE and CMCQRD datasets.

03

Difficulty inversely correlates with the proportion of correct answers.

Abstract

In an educational setting, an estimate of the difficulty of multiple-choice questions (MCQs), a commonly used strategy to assess learning progress, constitutes very useful information for both teachers and students. Since human assessment is costly from multiple points of view, automatic approaches to MCQ item difficulty estimation are investigated, yielding however mixed success until now. Our approach to this problem takes a different angle from previous work: asking various Large Language Models to tackle the questions included in three different MCQ datasets, we leverage model uncertainty to estimate item difficulty. By using both model uncertainty features as well as textual features in a Random Forest regressor, we show that uncertainty features contribute substantially to difficulty prediction, where difficulty is inversely proportional to the number of students who can correctly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Machine Learning and Algorithms