Evaluating Four Methods for Detecting Differential Item Functioning in Large-Scale Assessments with More Than Two Groups
Dandan Chen Kaptur, Jinming Zhang

TL;DR
This study compares four methods for detecting differential item functioning in large-scale assessments with multiple groups, focusing on their error rates and power through simulation.
Contribution
It provides a comprehensive evaluation of four DIF detection methods under various conditions, highlighting the effectiveness of the RMSD approach with model-predicted cutoffs.
Findings
RMSD approach had the best Type-I error control with model-predicted cutoffs.
RMSD was overly conservative with standard cutoff of 0.1.
Implications for educational assessment practices are discussed.
Abstract
This study evaluated four multi-group differential item functioning (DIF) methods (the root mean square deviation approach, Wald-1, generalized logistic regression procedure, and generalized Mantel-Haenszel method) via Monte Carlo simulation of controlled testing conditions. These conditions varied in the number of groups, the ability and sample size of the DIF-contaminated group, the parameter associated with DIF, and the proportion of DIF items. When comparing Type-I error rates and powers of the methods, we showed that the RMSD approach yielded the best Type-I error rates when it was used with model-predicted cutoff values. Also, this approach was found to be overly conservative when used with the commonly used cutoff value of 0.1. Implications for future research for educational researchers and practitioners were discussed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPsychometric Methodologies and Testing · Resilience and Mental Health · Educational and Psychological Assessments
