Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?

Andreas S\"auberli; Diego Frassinelli; Barbara Plank

arXiv:2506.09796·cs.CL·June 12, 2025

Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?

Andreas S\"auberli, Diego Frassinelli, Barbara Plank

PDF

Open Access

TL;DR

This study assesses whether instruction-tuned large language models produce responses similar to humans in educational assessments, exploring their potential to serve as pilot participants for test development.

Contribution

The paper evaluates the psychometric plausibility of LLM responses using classical test theory and item response theory frameworks across multiple subjects.

Findings

01

Larger models are overly confident in their responses.

02

Temperature scaling improves the human-likeness of LLM responses.

03

LLMs correlate better with humans in reading comprehension than in other subjects.

Abstract

Knowing how test takers answer items in educational assessments is essential for test development, to evaluate item quality, and to improve test validity. However, this process usually requires extensive pilot studies with human participants. If large language models (LLMs) exhibit human-like response behavior to test items, this could open up the possibility of using them as pilot participants to accelerate test development. In this paper, we evaluate the human-likeness or psychometric plausibility of responses from 18 instruction-tuned LLMs with two publicly available datasets of multiple-choice test items across three subjects: reading, U.S. history, and economics. Our methodology builds on two theoretical frameworks from psychometrics which are commonly used in educational assessment, classical test theory and item response theory. The results show that while larger models are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPsychometric Methodologies and Testing · Student Assessment and Feedback · Intelligent Tutoring Systems and Adaptive Learning