Examining Exams Using Rasch Models and Assessment of Measurement Invariance
Achim Zeileis

TL;DR
This paper explores the use of Rasch IRT models to evaluate exam fairness and measurement invariance, demonstrating methods with a first-year math exam and providing practical R tutorials.
Contribution
It introduces recent methods for assessing measurement invariance and differential item functioning in Rasch models, applied to real exam data with comprehensive R tutorials.
Findings
Identified potential biases in exam items across subgroups
Demonstrated the application of psychometric models to real exam data
Provided practical tools for assessing exam fairness
Abstract
Many statisticians regularly teach large lecture courses on statistics, probability, or mathematics for students from other fields such as business and economics, social sciences and psychology, etc. The corresponding exams often use a multiple-choice or single-choice format and are typically evaluated and graded automatically, either by scanning printed exams or via online learning management systems. Although further examinations of these exams would be of interest, these are frequently not carried out. For example a measurement scale for the difficulty of the questions (or items) and the ability of the students (or subjects) could be established using psychometric item response theory (IRT) models. Moreover, based on such a model it could be assessed whether the exam is really fair for all participants or whether certain items are easier (or more difficult) for certain subgroups of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics
