Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?
Andreas S\"auberli, Diego Frassinelli, Barbara Plank

TL;DR
This study assesses whether instruction-tuned large language models produce responses similar to humans in educational assessments, exploring their potential to serve as pilot participants for test development.
Contribution
The paper evaluates the psychometric plausibility of LLM responses using classical test theory and item response theory frameworks across multiple subjects.
Findings
Larger models are overly confident in their responses.
Temperature scaling improves the human-likeness of LLM responses.
LLMs correlate better with humans in reading comprehension than in other subjects.
Abstract
Knowing how test takers answer items in educational assessments is essential for test development, to evaluate item quality, and to improve test validity. However, this process usually requires extensive pilot studies with human participants. If large language models (LLMs) exhibit human-like response behavior to test items, this could open up the possibility of using them as pilot participants to accelerate test development. In this paper, we evaluate the human-likeness or psychometric plausibility of responses from 18 instruction-tuned LLMs with two publicly available datasets of multiple-choice test items across three subjects: reading, U.S. history, and economics. Our methodology builds on two theoretical frameworks from psychometrics which are commonly used in educational assessment, classical test theory and item response theory. The results show that while larger models are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPsychometric Methodologies and Testing · Student Assessment and Feedback · Intelligent Tutoring Systems and Adaptive Learning
