Exploring the Potential of Large Language Models for Estimating the   Reading Comprehension Question Difficulty

Yoshee Jain; John Hollander; Amber He; Sunny Tang; Liang Zhang; and; John Sabatini

arXiv:2502.17785·cs.CL·February 26, 2025

Exploring the Potential of Large Language Models for Estimating the Reading Comprehension Question Difficulty

Yoshee Jain, John Hollander, Amber He, Sunny Tang, Liang Zhang, and, John Sabatini

PDF

Open Access

TL;DR

This study explores the use of large language models, GPT-4, to automate the estimation of reading comprehension question difficulty, aiming to enhance scalability and personalization in educational assessments.

Contribution

It demonstrates that GPT-4 can effectively estimate question difficulty levels, aligning with traditional psychometric measures, and highlights its potential for scalable, adaptive educational systems.

Findings

01

GPT-4's difficulty estimates correlate with IRT parameters

02

Models show sensitivity to extreme item characteristics

03

Potential for scalable, automated assessment in education

Abstract

Reading comprehension is a key for individual success, yet the assessment of question difficulty remains challenging due to the extensive human annotation and large-scale testing required by traditional methods such as linguistic analysis and Item Response Theory (IRT). While these robust approaches provide valuable insights, their scalability is limited. There is potential for Large Language Models (LLMs) to automate question difficulty estimation; however, this area remains underexplored. Our study investigates the effectiveness of LLMs, specifically OpenAI's GPT-4o and o1, in estimating the difficulty of reading comprehension questions using the Study Aid and Reading Assessment (SARA) dataset. We evaluated both the accuracy of the models in answering comprehension questions and their ability to classify difficulty levels as defined by IRT. The results indicate that, while the models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning

MethodsALIGN