IRT scoring and the principle of consistent order
Nancy Lacourly, Jaime San Martin, Monica Silva, Paula Uribe

TL;DR
This study investigates whether the principle of consistent order holds when scoring high-stakes tests with IRT models, revealing that it often does not, which raises concerns about fairness and transparency.
Contribution
It demonstrates that the principle of consistent order is violated in practice when using 2PL and 3PL IRT models for scoring high-stakes tests.
Findings
The principle of consistent order does not hold in actual IRT scoring.
Students answering more difficult items can receive lower scores.
Complex IRT models may compromise fairness and transparency.
Abstract
IRT models are being increasingly used worldwide for test construction and scoring. The study examines the practical implications of estimating individual scores in a paper-and-pencil high-stakes test using 2PL and 3PL models, specifically whether the principle of consistent order holds when scoring with IRT. The principle states that student A, who answers the same (or a larger) number of items of greater difficulty than student B, should outscore B. Results of analyses conducted using actual scores from the Chilean national admission test in mathematics indicate the principle does not hold when scoring with 2PL or 3PL models. Students who answer more items and of greater difficulty may be assigned lower scores. The findings can be explained by examining the mathematical models, since estimated ability scores are an increasing function of the accumulated estimated discriminations for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPsychometric Methodologies and Testing · Student Assessment and Feedback · School Choice and Performance
